How to globally set thread pool for all CompletableFuture

How to globally set thread pool for all CompletableFuture - java

I am trying to mimic what single threaded async programming in Javascript in Java with the use of async / await library by EA (ea-async). This is mainly because I do not have long-lasting CPU bound computations in my program and I want to code single thread lock free code in Java.
ea-async library heavily relies on the CompletableFuture in Java and underneath Java seems to use ForkJoinPool to run the async callbacks. This puts me into multi threaded environment as my CPU is multi-core. It seems for every CompletableFuture task, I can supply async with my custom thread pool executor. I can supply Executors.newSingleThreadExecutor() for this but I need a way to set this globally so that all CompletableFuture will be using this executor within the single JVM process. How do I do this?

ea-async library heavily relies on the CompletableFuture in Java and
underneath Java seems to use ForkJoinPool to run the async callbacks.
That is the default behavior of CompleteableFuture:
All async methods without an explicit Executor argument are performed
using the ForkJoinPool.commonPool() (unless it does not support a
parallelism level of at least two, in which case, a new Thread is
created to run each task). This may be overridden for non-static
methods in subclasses by defining method defaultExecutor().
That's a defined characteristic of the class, so if you're using class CompleteableFuture, not a subclass, and generating instances without specifying an Executor explicitly, then a ForkJoinPool is what you're going to get.
Of course, if you are in control of the CompletableFutures provided to ea-async then you have the option to provide instances of a subclass that defines defaultExecutor() however you like. Alternatively, you can create your CompleteableFuture objects via the static factory methods that allow you to explicitly specify the Executor to use, such as runAsync(Runnable, Executor).
But that's probably not what you really want to do.
If you use an executor with only one thread, then your tasks can be executed asynchronously with respect to the thread that submits them, yes, but they will be serialized with respect to each other. You do get only one thread working on them, but it will at any time be working on a specific one, sticking with that one only until it finishes, regardless of the order in which the responses actually arrive. If that's satisfactory, then it's unclear why you want async operations at all.
This puts me into multi threaded environment as my CPU is multi-core.
It puts you in multiple threads regardless of how many cores your CPU has. That's what Executors do, even Executors.newSingleThreadExecutor(). That's the sense of "asynchronous" they provide.
If I understand correctly, you are instead looking to use one thread to multiplex I/O to multiple remote web applications. That is what java.nio.channels.Selector is for, but using that generally requires either managing the I/O operations yourself or using interfaces designed to interoperate with selectors. If you are locked in to third-party interfaces that do not afford use of a Selector, then multithreading and multiprocessing are your only viable alternatives.
In comments you wrote:
I'm starting to think maybe BlockingQueue might do the job in
consolidating all API responses into one queue as tasks where a single
thread will work on them.
Again, I don't think that you want everything that comes with that, and if in fact you do, then I don't see why it wouldn't be even better and easier to work synchronously instead of asynchronously.

Related

"Cannot reproduce" - is Java deterministic multithreading possible?

Is this possible to run multithreaded Java application in a deterministic fashion? I mean to have always the same thread switching in two different runs of my application.
Reason for that is to run simulation in exactly the same conditions in every run.
Similar case is when one gives some arbitrary seed when using random number generator to obtain always the same "random" sequence.

I am not aware of any practical way to do this.
In theory, it would be possible to implement a bytecode interpreter with an entirely deterministic behavior under certain assumptions1. You would need to simulate the multiple threads by implementing the threads and the thread scheduling entirely in software and using a single native thread.
1 - For example, no I/O, and no use of the system clock.

No it is not possible (other than to simulate it yourself) to use multiple threads interleaving in the same way each time around. Threads are not designed to do that.
If you want deterministic results, don't use threads.

As quoted by OldCurmudgeon, it's not possible with multi threading.
If you decide to use single Thread, I prefer newSingleThreadExecutor to normal Thread due to flexibility and advantages of newSingleThreadExecutor
Use
newSingleThreadExecutor from Executors
public static ExecutorService newSingleThreadExecutor()
Creates an Executor that uses a single worker thread operating off an unbounded queue. (Note however that if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.)
Tasks are guaranteed to execute sequentially, and no more than one task will be active at any given time. Unlike the otherwise equivalent newFixedThreadPool(1) the returned executor is guaranteed not to be reconfigurable to use additional threads.
Related SE questions:
Difference between Executors.newFixedThreadPool(1) and Executors.newSingleThreadExecutor()
ExecutorService vs Casual Thread Spawner

Is there a default thread pool in java

I can create a new threadpool in java and execute tasks on it using the ExecutorService.newFixedThreadPool and ExecutorService.submit methods.
Is there a 'default' threadpool that I can reuse for all executor services in my java program? Or do I just have to create a singleton that contains a default threadpool? C# has a default threadpool that runs tasks when the Task.Factory.StartNew method is called.

Since Java-8 there's ForkJoinPool.commonPool() which is used by default by many methods involving parallel or asyncronous execution. For example, Arrays.parallelSort() or parallel Stream API operation use this pool. You can submit your own tasks to this pool using many methods of CompletableFuture class like CompletableFuture.supplyAsync().

Using separate threadpools is good, default practice, and sharing threadpools is a (possibly premature) optimization.
Through Java 7 the answer is no, there is not a default threadpool, and the recommendation is to have many threadpools. It's good separation and will prevent blocking behavior on one collection of tasks from interfering with another.
If you share threadpools you should ask questions like:
will the logging framework be able to distinguish tasks? (Threads is one way to distinguish.)
If task pool A accidentally requests way too many threads and gets cut off, should task pool B starve? When you notice task pool B is failing will you be able to diagnose the problem in task pool A?
If pool A blocks should B starve?
Maybe you create something like a LightweightThreadpool. And the first 5 tasks you write use it in a lightweight fashion. And the 6th task... does, except it also writes errors to disk, and those errors are surprisingly big, and sometimes there's many of them, and they're not throttled. Suddenly the first 5 tasks are starved and have no idea what hit them, and furthermore, when you wrote those tasks, you really believed they were secure and might not have prepared for this type of incident.
So sharing threadpools is about as okay as having two different processes run on the same server is okay. You should think about resource management very carefully first and understand that the tasks are resource-coupled now. The lack of a default threadpool is trying to force you to use separate ones by default, and think about these questions carefully before sharing one.
As of Java 8 the answer is "yes" (per Tagir's answer on this question). But you will notice everything will start horribly failing if you submit blocking tasks to that threadpool.

Android AsyncTask.THREAD_POOL_EXECUTOR vs custom ThreadPool with Runnables

I have some tasks that I need to process concurrently on Android and I would like to use some sort of a thread pool to do so. I couldn't find in the documentation what actually happens "behind the scenes" when executing an AsyncTask with AsyncTask.THREAD_POOL_EXECUTOR.
My question is: What do I lose by using AsyncTasks with AsyncTask.THREAD_POOL_EXECUTOR as opposed to implementing a custom ThreadPool with Runnables? (Let's talk post-honeycomb).
I realize the question is rather general, but I'm fairly new to doing concurrent programming (besides AsyncTask itself). I'm not looking for a tutorial on concurrent programming! I only seek to understand how the Android specific AsyncTask.THREAD_POOL_EXECUTOR is different. I think an explanation would be helpful for others in the future as they weigh the pros and cons of choosing to use AsyncTask vs Thread/Runnable. Thanks in advance!

AsyncTasks provide you with possibility to execute actions on UI thread before and after executing worker task. So, if you dont need communicating with UI then use your own executor - you can always implement this using handler. AsyncTasks are being executed serially since api 11 because parallel execution was considered to difficult to properly implement.
If you need more flexibility, then executors are a way to go, they will allow you to freely specify how many tasks to execute in parallel, how many to put in queue etc.
If you are interested in details, you can always look into sources:
http://androidxref.com/4.4.3_r1.1/xref/development/samples/training/bitmapfun/BitmapFun/src/main/java/com/example/android/bitmapfun/util/AsyncTask.java

Non-UI work might be taken by anything including AsyncTasks, HandlerThreads, IntentServices etc.
The reason it's suggested AsyncTasks for UI-related works (works that affect UI) is that AsyncTask has helper callbacks that lets you to transfer the control to the UI thread.
However, it's not suggested for longer running operations since it's, by default, uses a global executor and this may cause app-global waiting threads to be stalled while executing long-runnings ops. So you can switch to a custom executor and get rid of global affect.
At the end of the day, HandlerThreads are threads again that gives a Looper to keep the thread alive. Executions will still be done in serial so what's the real reason to use them ? I believe it's the power of ability to execute Runnables like Executors but more in light-weight fashion.
IntentServices are again - the way to execute tasks serially but you've more power and isolation since they're entirely different components has seperate lifecycles. They automatically destroyed so you don't have to worry about destroying them to reduce your app process priority ( off the topic but causes some memory performance problems, trashing etc. )

Parallel-processing in Java; advice needed i.e. on Runnanble/Callable interfaces

Assume that I have a set of objects that need to be analyzed in two different ways, both of which take relatively long time and involve IO-calls, I am trying to figure out how/if I could go about optimizing this part of my software, especially utilizing the multiple processors (the machine i am sitting on for ex is a 8-core i7 which almost never goes above 10% load during execution).
I am quite new to parallel-programming or multi-threading (not sure what the right term is), so I have read some of the prior questions, particularly paying attention to highly voted and informative answers. I am also in the process of going through the Oracle/Sun tutorial on concurrency.
Here's what I thought out so far;
A thread-safe collection holds the objects to be analyzed
As soon as there are objects in the collection (they come a couple at a time from a series of queries), a thread per object is started
Each specific thread takes care of the initial pre-analysis preparations; and then calls on the analyses.
The two analyses are implemented as Runnables/Callables, and thus called on by the thread when necessary.
And my questions are:
Is this a reasonable scheme, if not, how would you go about doing this?
In order to make sure things don't get out of hand, should I implement a ThreadManager or some thing of that sort, which starts and stops threads, and re-distributes them when they are complete? For example, if i have 256 objects to be analyzed, and 16 threads in total, the ThreadManager assigns the first finished thread to the 17th object to be analyzed etc.
Is there a dramatic difference between Runnable/Callable other than the fact that Callable can return a result? Otherwise should I try to implement my own interface, in that case why?
Thanks,

You could use a BlockingQueue implementation to hold your objects and spawn your threads from there. This interface is based on the producer-consumer principle. The put() method will block if your queue is full until there is some more space and the take() method will block if the queue is empty until there are some objects again in the queue.
An ExecutorService can help you manage your pool of threads.
If you are awaiting a result from your spawned threads then Callable interface is a good idea to use since you can start the computation earlier and work in your code assuming the results in Future-s. As far as the differencies with the Runnable interface, from the Callable javadoc:
The Callable interface is similar to Runnable, in that both are designed for classes whose instances are potentially executed by another thread. A Runnable, however, does not return a result and cannot throw a checked exception.
Some general things you need to consider in your quest for java concurrency:
Visibility is not coming by defacto. volatile, AtomicReference and other objects in the java.util.concurrent.atomic package are your friends.
You need to carefully ensure atomicity of compound actions using synchronization and locks.

Your idea is basically sound. However, rather than creating threads directly, or indirectly through some kind of ThreadManager of your own design, use an Executor from Java's concurrency package. It does everything you need, and other people have already taken the time to write and debug it. An executor manages a queue of tasks, so you don't need to worry about providing the threadsafe queue yourself either.
There's no difference between Callable and Runnable except that the former returns a value. Executors will handle both, and ready them the same.
It's not clear to me whether you're planning to make the preparation step a separate task to the analyses, or fold it into one of them, with that task spawning the other analysis task halfway through. I can't think of any reason to strongly prefer one to the other, but it's a choice you should think about.

The Executors provides factory methods for creating thread pools. Specifically Executors#newFixedThreadPool(int nThreads) creates a thread pool with a fixed size that utilizes an unbounded queue. Also if a thread terminates due to a failure then a new thread will be replaced in its place. So in your specific example of 256 tasks and 16 threads you would call
// create pool
ExecutorService threadPool = Executors.newFixedThreadPool(16);
// submit task.
Runnable task = new Runnable(){};;
threadPool.submit(task);
The important question is determining the proper number of threads for you thread pool. See if this helps Efficient Number of Threads

Sounds reasonable, but it's not as trivial to implement as it may seem.
Maybe you should check the jsr166y project.
That's probably the easiest solution to your problem.

Should I limit the number of Executors I have?

I have a Java project where I need to run things in parallel. I do this with executors. The thing is, I need to use executors in a great many places. Should I favor passing a few executors around to do the work (forget about limiting the global number of threads for a moment) or is it preferable to create the executors where I need them?

What you really need to think about is controlling the number of Threads working off any Executors you create.
The number of threads you create off each executor will be a function of the frequency of arrival and expected duration (processing time) of each task being submitted. Having a queue per logical task type allows you to tune the executor for just that task, so that you don't have more threads than required, and you can always keep up with the expected task throughput.
If you have one monolithic Executor shared across all processing stages in your app it becomes much harder to tune.
SEDA is a typical concurrency pattern that reflects this principle of queue per processing stage.
In some instances it does make sense to have a shared executor, such as for infrequent, ad-hoc or low priority scheduled tasks.

There's no strict rule that will tell you how many executors should be used. One thing, though can be recommended. Use some dependency injection mechanism or framework to inject executor implementations. This will allow quick and easy replacement and configuration of used executors.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.