Context: I've read this SO thread discussing the differences between CompletableFuture and Thread.
But I'm trying to understand when should I use new Thread() instead of runAsync().
I understand that runAsyn() is more efficient for short/one-time parallel task because my program might avoid the overhead of creating a brand new thread, but should I also consider it for long running operations?
What are the factors that I should be aware before considering to use one over the other?
Thanks everyone.
The difference between using the low-level concurrency APIs (such as Thread) and others is not just about the kind of work that you want to get done, it's also about the programming model and also how easy they make it to configure the environment in which the task runs.
In general, it is advisable to use higher-level APIs, such as CompletableFuture instead of directly using Threads.
I understand that runAsyn() is more efficient for short/one-time parallel task
Well, maybe, assuming you call runAsync, counting on it to use the fork-join pool. The runAsync method is overloaded with a method that allows one to specify the java.util.concurrent.Executor with which the asynchronous task is executed.
Using an ExecutorService for more control over the thread pool and using CompletableFuture.runAsync or CompletableFuture.supplyAsync with a specified executor service (or not) are both generally preferred to creating and running a Thread object directly.
There's nothing particularly for or against using the CompletableFuture API for long-running tasks. But the choice one makes to use Threads has other implications as well, among which:
The CompletableFuture gives a better API for programming reactively, without forcing us to write explicit synchronization code. (we don't get this when using threads directly)
The Future interface (which is implemented by CompletableFuture) gives other additional, obvious advantages.
So, in short: you can (and probably should, if the alternative being considered is the Thread API) use the CompletableFuture API for your long-running tasks. To better control thread pools, you can combine it with executor services.
The main difference is CompletableFuture run your task by default on the ForkJoinPool.commonPool. But if you create your own thread and start it will execute as a single thread, not on a Thread pool. Also if you want to execute some task in a sequence but asynchronously. Then you can do like below.
CompletableFuture.runAsync(() -> {
System.out.println("On first task");
System.out.println("Thread : " + Thread.currentThread());
}).thenRun(() -> {
System.out.println("On second task");
});
Output:
On first task
Thread : Thread[ForkJoinPool.commonPool-worker-1,5,main]
On second task
If you run the above code you can see that which pool CompletableFuture is using.
Note: Threads is Daemon in ForkJoinPool.commonPool.
Related
I am trying to mimic what single threaded async programming in Javascript in Java with the use of async / await library by EA (ea-async). This is mainly because I do not have long-lasting CPU bound computations in my program and I want to code single thread lock free code in Java.
ea-async library heavily relies on the CompletableFuture in Java and underneath Java seems to use ForkJoinPool to run the async callbacks. This puts me into multi threaded environment as my CPU is multi-core. It seems for every CompletableFuture task, I can supply async with my custom thread pool executor. I can supply Executors.newSingleThreadExecutor() for this but I need a way to set this globally so that all CompletableFuture will be using this executor within the single JVM process. How do I do this?
ea-async library heavily relies on the CompletableFuture in Java and
underneath Java seems to use ForkJoinPool to run the async callbacks.
That is the default behavior of CompleteableFuture:
All async methods without an explicit Executor argument are performed
using the ForkJoinPool.commonPool() (unless it does not support a
parallelism level of at least two, in which case, a new Thread is
created to run each task). This may be overridden for non-static
methods in subclasses by defining method defaultExecutor().
That's a defined characteristic of the class, so if you're using class CompleteableFuture, not a subclass, and generating instances without specifying an Executor explicitly, then a ForkJoinPool is what you're going to get.
Of course, if you are in control of the CompletableFutures provided to ea-async then you have the option to provide instances of a subclass that defines defaultExecutor() however you like. Alternatively, you can create your CompleteableFuture objects via the static factory methods that allow you to explicitly specify the Executor to use, such as runAsync(Runnable, Executor).
But that's probably not what you really want to do.
If you use an executor with only one thread, then your tasks can be executed asynchronously with respect to the thread that submits them, yes, but they will be serialized with respect to each other. You do get only one thread working on them, but it will at any time be working on a specific one, sticking with that one only until it finishes, regardless of the order in which the responses actually arrive. If that's satisfactory, then it's unclear why you want async operations at all.
This puts me into multi threaded environment as my CPU is multi-core.
It puts you in multiple threads regardless of how many cores your CPU has. That's what Executors do, even Executors.newSingleThreadExecutor(). That's the sense of "asynchronous" they provide.
If I understand correctly, you are instead looking to use one thread to multiplex I/O to multiple remote web applications. That is what java.nio.channels.Selector is for, but using that generally requires either managing the I/O operations yourself or using interfaces designed to interoperate with selectors. If you are locked in to third-party interfaces that do not afford use of a Selector, then multithreading and multiprocessing are your only viable alternatives.
In comments you wrote:
I'm starting to think maybe BlockingQueue might do the job in
consolidating all API responses into one queue as tasks where a single
thread will work on them.
Again, I don't think that you want everything that comes with that, and if in fact you do, then I don't see why it wouldn't be even better and easier to work synchronously instead of asynchronously.
I know that shutdown() and awaitTermination() exist. The problem is that the runnables in the pool need to be able to add an unknown number (can't use a countdownlatch) of other runnables to it and if I call shutdown() those tasks will be rejected. How can I know when they're done?
Work with Future rather than with Runnable. There's this Future#isDone method that may help you.
In case you don't have anything meaningful to return from the Callable, use Callable<Void> and Future<Void>.
Instead of submitting Runnable tasks to an Executor, you should rather use ForkJoinTask/ForkJoinPool instead. A ForkJoinTask runs inside a ForkJoinPool and can spawn an arbitrary number of (sub)tasks and wait for them to complete, without actually blocking the current thread. A ForkJoinTask is complete when all of its sub-tasks are done, so the entire computation is done, when the initial (root) ForkJoinTask is complete.
See Oracle - The Java™ Tutorials - Fork/Join for details.
As all of your tasks are resultless (Runnable), you should subclass RecursiveAction (which is itself a subclass of ForkJoinTask). Implement the method compute(), and spawn an arbitrary number of new tasks there by either calling invoke(subtask), invokeAll(subtask1, subtask2, ...) or subtask.fork() followed by subtask.join().
The entire computation is executed as follows:
MyRecursiveAction task = new MyRecursiveAction(params);
ForkJoinPool pool = new ForkJoinPool(numberOfThreads);
pool.invoke(task); // will block until task is done
Unfortunatley the advantages of Fork/Join have some limitations, e.g.:
(...) Computations should ideally avoid synchronized methods or blocks, and
should minimize other blocking synchronization apart from joining
other tasks or using synchronizers such as Phasers that are advertised
to cooperate with fork/join scheduling. Subdividable tasks should also
not perform blocking I/O, and should ideally access variables that are
completely independent of those accessed by other running tasks. These
guidelines are loosely enforced by not permitting checked exceptions
such as IOExceptions to be thrown. (...)
For more detail see API docs of ForkJoinTask.
If you are able to use Guava Futures, you can use Futures.allAsList or Futures.successfulAsList. This allows you to wrap a number of Future instances that you got back from the ExecutorService into a single Future which you can then check to see if it is finished using isDone() (or just get(), for that matter, if you want to block until completion).
What benefit is there to use Executors over just Threads in a Java program.
Such as
ExecutorService pool = Executors.newFixedThreadPool(2);
void someMethod() {
//Thread
new Thread(new SomeRunnable()).start();
//vs
//Executor
pool.execute(new SomeRunnable());
}
Does an executor just limit the number of threads it allows to have running at once (Thread Pooling)? Does it actually multiplex runnables onto the threads it creates instead? If not is it just a way to avoid having to write new Thread(runnable).start() every time?
Yes, executors will generally multiplex runnables onto the threads they create; they'll constrain and manage the number of threads running at once; they'll make it much easier to customize concurrency levels. Generally, executors should be preferred over just creating bare threads.
Creating new threads is expensive. Because Executors uses a thread pool, you get to easily reuse threads, resulting in better performance.
Does an executor just limit the number of threads it allows to have running at once (Thread Pooling)?
Executors#newFixedThreadPool(int), Executors#newSingleThreadExecutor do this, each one under different terms (read the proper javadoc to know more about it).
Does it actually multiplex runnables onto the threads it creates instead?
Yes
If not is it just a way to avoid having to write new Thread(runnable).start() every time?
ExecutorService helps you to control the way you handle threads. Of course, you can do this manually, but there's no need to reinvent the wheel. Also, there are other functionalities that ExecutorService provides you like executing asynchronous tasks through the usage of Future instances.
There are multiple concerns related to thread.
managing threads
resource utilization
creation of thread
Executors provides different kind of implementation for creating a pool of threads. Also thread creation is a costly affair. Executors creates and manages these threads internally. Details about it can be found in the below link.
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html
As I said over in a related question, Threads are pretty bad. Executors (and the related concurrency classes) are pretty good:
Caveat: Around here, I strongly discourage the use of raw Threads. I
much prefer the use of Callables and FutureTasks (From the javadoc: "A
cancellable asynchronous computation"). The integration of timeouts,
proper cancelling and the thread pooling of the modern concurrency
support are all much more useful to me than piles of raw Threads.
For example, I'm currently replacing a legacy piece of code that used a disjoint Thread running in a loop with a self-timer to determine how long it should Thread.sleep() after each iteration. My replacement will use a very simple Runnable (to hold a single iteration), a ScheduledExecutorService to run one of the iterations and the Future resulting from the scheduleAtAFixedRate method to tune the timing between iterations.
While you could argue that replacement will be effectively equivalent to the legacy code, I'll have replaced an arcane snarl of Thread management and wishful thinking with a compartmentalized set of functionality that separates the concerns of the GUI (are we currently running?) from data processing (playback at 5x speed) and file management (cancel this run and choose another file).
Is there an ExecutorService that allows an existing thread to perform the executions instead of spawning new threads? Bonus if it’s a ScheduledExecutor. Most executors spawn worker threads to do the execution, but I want the worker thread to be an existing thread that I’m on. Here's the API that I imagine:
while (!executor.isTerminated()) {
Runnable r = executor.take();
r.run();
}
This is similar to the way that SWT and JavaFX allow the main thread to dispatch events, as opposed to Swing, which requires its own event dispatch thread to be spawned to handle events.
Motivation: I currently have lots of places where a thread spawn a new executor and then just calls awaitTermination() to wait for it to finish. I’d like to save some resources and keep the stack traces from being split in two.
Note that I don’t want an executor that runs tasks in execute(Runnable)’s caller threads, which is what this answer and Guava’s MoreExecutors.sameThreadExecutor() do.
Most executors from java.util.concurrent behave exactly as you supposed. Some spawn additional threads when there are too many tasks, but usually they can be configured to set a limit.
To exploit such a behaviour, do not start new executor each time - use the same executor. To wait for a set of tasks to finish, use invokeAll(), or submit() and then future.get()
I'm assuming what you want is control over the creation of new threads, such as name, daemon-status, etc. Use a ThreadFactory:
public class MyThreadFactory implements ThreadFactory {
public Thread newThread(Runnable runnable) {
Thread t = new Thread(runnable, "MyThreadName");
t.setDaemon(true);
return t;
}
}
This allows you to control thread creation so that the execution happens in threads that you manufacture instead of some default thread from a default ThreadFactory.
Then to use it, all of the methods in Executors take a ThreadFactory:
Executors.newExecutorOfSomeKind(new MyThreadFactory());
Edit: I see what you mean now. Unfortunately, the behavior of all Executor implementations (as far as I'm aware) is to create new threads to run the task, except the sameThreadExecutor you mentioned. Without going through the Thread objects that are creating executors just to execute one task (which is a horrible design -- see comments for what I mean by this), there's no easy way to accomplish what you want. I would recommend changing the code to use a single Executor with something like an ExecutorCompletionService (see this question) or use a fork/join pattern. Fork/join is made easier in Java 7 (see this Java trail). For pre-Java 7 code, read up on the counting Semaphore in Java (and in general).
Executor seems like a clean abstraction. When would you want to use Thread directly rather than rely on the more robust executor?
To give some history, Executors were only added as part of the java standard in Java 1.5. So in some ways Executors can be seen as a new better abstraction for dealing with Runnable tasks.
A bit of an over-simplification coming... - Executors are threads done right so use them in preference.
I use Thread when I need some pull based message processing. E.g. a Queue is take()-en in a loop in a separate thread. For example, you wrap a queue in an expensive context - lets say a JDBC connection, JMS connection, files to process from single disk, etc.
Before I get cursed, do you have some scenario?
Edit:
As stated by others, the Executor (ExecutorService) interface has more potential, as you can use the Executors to select a behavior: scheduled, prioritized, cached etc. in Java 5+ or a j.u.c backport for Java 1.4.
The executor framework has protection against crashed runnables and automatically re-create worker threads. One drawback in my opinion, that you have to explicitly shutdown() and awaitTermination() them before you exit your application - which is not so easy in GUI apps.
If you use bounded queues you need to specify a RejectedExecutionHandler or the new runnables get thrown away.
You might have a look at Brian Goetz et al: Java Concurrency in Practice (2006)
There is no advantage to using raw threads. You can always supply Executors with a Thread factory, so even the option of custom thread creation is covered.
You don't use Thread unless you need more specific behaviour that is not found in Thread itself. You then extend Thread and add your specifically wanted behaviour.
Else just use Runnable or Executor.
Well, I thought that a ThreadPoolExecutor provided better performance for it manages a pool of threads, minimizing the overhead of instantiating a new thread, allocating memory...
And if you are going to launch thousands of threads, it gives you some queuing functionality you would have to program by yourself...
Threads & Executors are different tools, used on different scenarios... As I see it, is like asking why should I use ArrayList when I can use HashMap? They are different...
java.util.concurrent package provides executor interface and can be used to created thread.
The Executor interface provides a single method, execute, designed to be a drop-in replacement for a common thread-creation idiom. If r is a Runnable object, and e is an Executor object you can replace
(new Thread(r)).start();
with
e.execute(r);
Refer here
It's always better to prefer Executor to Thread even for single thread as below
ExecutorService fixedThreadPool = Executors.newFixedThreadPool(1);
You can use Thread over Executor in below scenarios
Your application needs limited thread(s) and business logic is simple
If simple multi-threading model caters your requirement without Thread Pool
You are confident of managing thread(s) life cycle + exception handling scenarios with help of low level APIs in below areas : Inter thread communication, Exception handling, reincarnation of threads due to unexpected errors
and one last point
If your application does not need customization of various features of ThreadPoolExecutor
ThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime,
TimeUnit unit, BlockingQueue<Runnable> workQueue, ThreadFactory threadFactory,
RejectedExecutionHandler handler)
In all other cases, you can go for ThreadPoolExecutor