How do I make a block aware execution context? - java

For some reason I can't wrap my head around implementing this. I've got an application running with Play that calls out to Elastic Search. As part of my design, my service uses the Java API wrapped with scala future's as shown in this blog post. I've updated the code from that post to hint to the ExecutionContext that it will be doing some blocking I/O like so:
import scala.concurent.{blocking, Future, Promise}
import org.elasticsearch.action.{ActionRequestBuilder, ActionListener, ActionResponse }
def execute[RB <: ActionRequestBuilder[_, T, _, _]](request: RB): Future[T] = {
blocking {
My actual service that constructs the queries to send to ES takes an executionContext as a constructor parameter that it then uses for calls to elastic search. I did this so that the global execution context that play uses won't have it's threads tied down by the blocking calls to ES. This S.O. comment mentions that only the global context is blocking aware, so that leaves me to have to create my own. In that same post/answer there's a lot of information about using a ForkJoin pool, but I'm not sure how to take what's written in those docs and combine it with the hints in the blocking documentation to create an execution context that responds to blocking hints.
I think one of the issues I have is that I'm not sure exactly how to respond to the blocking context in the first place? I was reading the best practices and the example it uses is an unbounded cache of threads:
Note that here I prefer to use an unbounded "cached thread-pool", so it doesn't have a limit. When doing blocking I/O the idea is that you've got to have enough threads that you can block. But if unbounded is too much, depending on use-case, you can later fine-tune it, the idea with this sample being that you get the ball rolling.
So does this mean that with my ForkJoin backed thread pool, that I should try to use a cached thread when dealing with non-blocking I/O and create a new thread for blocking IO? Or something else? Pretty much every resource I find online about using seperate thread pools tends to do what the Neophytes guide does, which is to say:
How to tune your various thread pools is highly dependent on your individual application and beyond the scope of this article.
I know it depends on your application, but in this case if I just want to create some type of blocking aware ExecutionContext and understand a decent strategy for managing the threads. If the Context is specifically for a single part of the application, should I just make a fixed thread pool size and not use/ignore the blocking keyword in the first place?
I tend to ramble, so I'll try to break down what I'm looking for in an answer:
Code! Reading all these docs still leave me like I'm feeling just out of reach of being able to code a blocking-aware context, and I'd really appreciate an example.
Any links or tips on how to handle blocking threads, i.e. make a new thread for them endlessly, check the number of threads available and reject if too many, some other strategy
I'm not looking for performance tips here, I know I'll only get that with testing, but I can't test if I can't figure out how to code the context's in the first place! I did find an example of ForkJoins vs threadpools but I'm missing the crucial part about blocking there.
Sorry for the long question here, I'm just trying to give you a sense of what I'm looking at and that I have been trying to wrap my head around this for over a day and need some outside help.
Edit: Just to make this clear, the ElasticSearch Service's constructor signature is:
//Note that these are not implicit parameters!
class ElasticSearchService(otherParams ..., val executionContext: ExecutionContext)
And in my application start up code I have something like this:
object Global extends GlobalSettings {
val elasticSearchContext = //Custom Context goes here
val elasticSearchService = new ElasticSearchService(params, elasticSearchContext);
I am also reading through Play's recommendations for contexts, but have yet to see anything about blocking hints yet and I suspect I might have to go look into the source to see if they extend the BlockContext trait.

So I dug into the documentation and Play's best practices for the situation I'm dealing with is to
In certain circumstances, you may wish to dispatch work to other thread pools. This may include CPU heavy work, or IO work, such as database access. To do this, you should first create a thread pool, this can be done easily in Scala:
And provides some code:
object Contexts {
implicit val myExecutionContext: ExecutionContext = Akka.system.dispatchers.lookup("my-context")
The context is from Akka, so I ran down there searching for the defaults and types of Contexts they offer, which eventually led me to the documentation on dispatchers. The default is a ForkJoinPool whose default method for managing a block is to call the managedBlock(blocker). This led me to reading the documentation that stated:
Blocks in accord with the given blocker. If the current thread is a ForkJoinWorkerThread, this method possibly arranges for a spare thread to be activated if necessary to ensure sufficient parallelism while the current thread is blocked.
So it seems like if I have a ForkJoinWorkerThread then the behavior I think I want will take place. Looking at the source of ForkJoinPool some more I noted that the default thread factory is:
val defaultForkJoinWorkerThreadFactory: ForkJoinWorkerThreadFactory = juc.ForkJoinPool.defaultForkJoinWorkerThreadFactory
Which implies to me that if I use the defaults in Akka, that I'll get a context which handles blocking in the way I expect.
So reading the Akka documentation again it would seem that specifying my context something like this:
my-context {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 8
parallelism-factor = 3.0
parallelism-max = 64
task-peeking-mode = "FIFO"
throughput = 100
would be what I want.
While I was searching in the source code I did some looking for uses of blocking or of calling managedBlock and found an example of overriding the ForkJoin behavior in ThreadPoolBuilder
private[akka] class AkkaForkJoinWorkerThread(_pool: ForkJoinPool) extends ForkJoinWorkerThread(_pool) with BlockContext {
override def blockOn[T](thunk: ⇒ T)(implicit permission: CanAwait): T = {
val result = new AtomicReference[Option[T]](None)
ForkJoinPool.managedBlock(new ForkJoinPool.ManagedBlocker {
def block(): Boolean = {
def isReleasable = result.get.isDefined
result.get.get // Exception intended if None
Which seems like what I originally asked for as an example of how to make something that implements the BlockContext. That file also has code showing how to make an ExecutorServiceFactory, which is what I believe
is reference by the executor part of the configuration. So I think what I would do if I wanted to have
a totally custom context would be extend some type of WorkerThread and write my own ExecutorServiceFactory that uses the custom workerthread and then specify the fully qualified class name in the property like this post advises.
I'm probably going to go with using Akka's forkjoin :)


Java's equivalent to Scala's futures?

I'm looking for what the equivalent to Scala's futures is in Java.
I'm looking for a type of construct that allows me to submit tasks (Runnables / Callables) to a specific thread-pool of my choice, returning futures allowing me to chain some logic (in a non-blocking way) to it when it gets completed. Something like this:
var executor = Executors.newCachedThreadPool();
executor.submit(() -> {
return 666;
}.onComplete(v -> System.out.println(v));
Java's thread-pools (available through the Executors singleton) seem to always return standard Java Futures, which only allow me to call a blocking get(). On the other hand, the CompletableFuture, from what I can understand, is more akin to Scala's promises, and is not tied to a thread-pool.
Does Java provide what I'm looking for? How do people in Java-land deal with these kinds of operations?
You want callback hell? That's a new one.
from what I can understand, is more akin to Scala's promises, and is not tied to a thread-pool.
Incorrect. I think CompletableFuture is precisely what you want :)
There is a default executor that will be used, but you can also specify one explicitly if you prefer - the supplyAsync and runAsync methods have overloads where you can pass in an explicit executor instead, and all the stuff in the chain uses whatever the future you're chaining off of uses.
CompletableFuture.supplyAsync(() -> {
System.out.println("Helloooo there, from stage 1!");
return 666;
}).whenCompleteAsync((result, exception) -> {
System.out.println("Coming to you live from stage 2: " + result);
// result is null if an error has occurred in stage 1.
// exception is null if an error did not occur.
NB: If you toss this in a psv main, make sure to add a .get() at the very end; without that, the VM will exit before the future gets a shot to actually do the work. Then, you will see:
> Helloooo there, from stage 1!
> Coming to you live from stage 2: 666
The first string appears after ~5 seconds. the second after 10, and then the VM exits.
Note that it looks like java's future is not, heh, futures. Futures in java suffer from callback hell, and do not solve the 'red / blue' methods issue (that's the issue where invoking anything that (potentially) blocks from async code is a very problematic bug: It is hard to detect statically, almost impossible to test for, and nevertheless will completely ruin your performance in production. Unfortunately, it is relatively hard to realize you've accidentally done something that blocks, and few existing APIs and libraries both document whether they do or not, and then commit to never changing this without considering that change a backwards incompatible update and managing their versions appropriately to reflect this).
These are solvable problems, but it won't be easy: Java will need 'async' or similar, and also a serious effort to add documentation and possibly something that can be compile-time checked, e.g. with an annotation.
But none of that is anywhere on the horizon of java's future. What IS on the horizon, however (real soon now, timespan of a year or 2 at the most) is 'Project Loom' - which adds lightweight threads ('fibers') to java: They represent execution state but cannot, themselves, run on another core. You can make millions of em, no problem. Then you just write:
int priceyOp1 = doSomethingThatTakesLong();
int priceyOp2 = thisIsAlsoSlow(priceyOp1);
and shove the whole thing into a fiber, and even have a hopper concept where a thread pool will realize the fiber it is currently running is now blocked and go fish another fiber out of the pool and run that for a while. THat doesn't make futures completely pointless, but that does probably mean that futures will remain niche, and async/callback hell will not be solved anytime soon.

Monitoring the size of the Netty event loop queues

We've implemented monitoring for the Netty event loop queues in order to understand issues with some of our Netty modules.
The monitor uses the io.netty.util.concurrent.SingleThreadEventExecutor#pendingTasks method, which works for most modules, but for a module that handle a few thousand HTTP requests per second it seem to be hung, or very slow.
I now realize that the docs strictly specify this can be an issue, and I feel pretty lame... so I'm looking for another way to implement this monitor.
You can see the old code here:
public static void registerQueueGauges(final MetricFactory factory, final EventLoopGroup elg, final String componentName) {
int index = 0;
for (final EventExecutor eventExecutor : elg) {
if (eventExecutor instanceof SingleThreadEventExecutor) {
final SingleThreadEventExecutor singleExecutor = (SingleThreadEventExecutor) eventExecutor;
factory.registerGauge("EventLoopGroup-" + componentName, "EventLoop-" + index, new Gauge<Integer>() {
public Integer getValue() {
return singleExecutor.pendingTasks();
My question is, is there a better way to monitor the queue sizes?
This can be quite a useful metric, as it can be used to understand latency, and also to be used for applying back-pressure in some cases.
You'd probably need to track the changes as tasks as added and removed from the SingleThreadEventExecutor instances.
To do that you could create a class that wraps and/or extends SingleThreadEventExecutor. Then you'd have an java.util.concurrent.atomic.AtomicInteger that you'd call incrementAndGet() every time a new task is added and decrementAndGet() every time one is removed/finishes.
That AtomicInteger would then give you the current number of pending tasks. You could probably override pendingTasks() to use that value instead (though be careful there - I'm not 100% that wouldn't have side effects).
It would add a bit of overhead to every task being executed, but would make retrieving the number of pending tasks near constant speed.
The downside to this is of course that it's more invasive than what you are doing at the moment, as you'd need to configure your app to use different event executors.
NB. this is just a suggestion on how to work around the issue - I've not specifically done this with Netty. Though I've done this sort of thing with other code in the past.
Now, in 2021, Netty uses JCTools queues internally and pendingTasks() execution is very fast (almost always constant-time), so even than javadoc still declares that this operation is slow, you can use it without any concerns.
Previously the issue was that counting the elements in the queue was a linear operation, but after migration to JCTools library this problem disappeared.

Java: guide-line for when to use thread-pooling?

This is a high-volume production system, however, this particular code path is seldom used. Its an import feature that can potential result in a lot data coming in, but it's only occasionally used, a few times a month, perhaps.
Having a (polite) debate with a colleague. The issue is whether a simple thread created the old fashioned way:
Runnable thread = new Runnable() {
public void run() {
//... do the import work ...
new Thread(thread).start();
Is sufficient, or if this requires using a thread pool.
This is happening in a service-layer class that is called from a servlet (providing a RESTful interface). The purpose being to allow the response to return and free the UI while the import happens.
As a follow on - in this situation, is using a thread pool actually just going to add more unnecessary (coding and resource use) overhead?
After EJP's comment - is there a good guideline for when it becomes 'worth having a discussion' about using pooling instead of straight thread creation?
A threadpool would only be useful if you were planning on starting a lot of these threads, and then avoid thread creation overhead by re-using them instead of kill + re-creating them for subsequent work.
Since this code path is used so rarely, you will not need a threadpool.
However, it sounds like you are doing this heavy work in the same process that serves your REST API? You may want to consider passing this work to a worker that runs in a separate process.

Best practices with Akka in Scala and third-party Java libraries

I need to use memcached Java API in my Scala/Akka code. This API gives you both synchronous and asynchronous methods. The asynchronous ones return java.util.concurrent.Future. There was a question here about dealing with Java Futures in Scala here How do I wrap a java.util.concurrent.Future in an Akka Future?. However in my case I have two options:
Using synchronous API and wrapping blocking code in future and mark blocking:
Future {
blocking {
cache.get(key) //synchronous blocking call
Using asynchronous Java API and do polling every n ms on Java Future to check if the future completed (like described in one of the answers above in the linked question above).
Which one is better? I am leaning towards the first option because polling can dramatically impact response times. Shouldn't blocking { } block prevent from blocking the whole pool?
I always go with the first option. But i am doing it in a slightly different way. I don't use the blocking feature. (Actually i have not thought about it yet.) Instead i am providing a custom execution context to the Future that wraps the synchronous blocking call. So it looks basically like this:
val ecForBlockingMemcachedStuff = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(100)) // whatever number you think is appropriate
// i create a separate ec for each blocking client/resource/api i use
Future {
cache.get(key) //synchronous blocking call
}(ecForBlockingMemcachedStuff) // or mark the execution context implicit. I like to mention it explicitly.
So all the blocking calls will use a dedicated execution context (= Threadpool). So it is separated from your main execution context responsible for non blocking stuff.
This approach is also explained in a online training video for Play/Akka provided by Typesafe. There is a video in lesson 4 about how to handle blocking calls. It is explained by Nilanjan Raychaudhuri (hope i spelled it correctly), who is a well known author for Scala books.
Update: I had a discussion with Nilanjan on twitter. He explained what the difference between the approach with blocking and a custom ExecutionContext is. The blocking feature just creates a special ExecutionContext. It provides a naive approach to the question how many threads you will need. It spawns a new thread every time, when all the other existing threads in the pool are busy. So it is actually an uncontrolled ExecutionContext. It could create lots of threads and lead to problems like an out of memory error. So the solution with the custom execution context is actually better, because it makes this problem obvious. Nilanjan also added that you need to consider circuit breaking for the case this pool gets overloaded with requests.
TLDR: Yeah, blocking calls suck. Use a custom/dedicated ExecutionContext for blocking calls. Also consider circuit breaking.
The Akka documentation provides a few suggestions on how to deal with blocking calls:
In some cases it is unavoidable to do blocking operations, i.e. to put
a thread to sleep for an indeterminate time, waiting for an external
event to occur. Examples are legacy RDBMS drivers or messaging APIs,
and the underlying reason is typically that (network) I/O occurs under
the covers. When facing this, you may be tempted to just wrap the
blocking call inside a Future and work with that instead, but this
strategy is too simple: you are quite likely to find bottlenecks or
run out of memory or threads when the application runs under increased
The non-exhaustive list of adequate solutions to the “blocking
problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router), making sure to configure a thread pool which is either
dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded
number of tasks of this nature will exhaust your memory or thread
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the
hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they
occur as actor messages.
The first possibility is especially well-suited for resources which
are single-threaded in nature, like database handles which
traditionally can only execute one outstanding query at a time and use
internal synchronization to ensure this. A common pattern is to create
a router for N actors, each of which wraps a single DB connection and
handles queries as sent to the router. The number N must then be tuned
for maximum throughput, which will vary depending on which DBMS is
deployed on what hardware.

What design pattern to use for a threaded queue

I have a very complex system (100+ threads) which need to send email without blocking. My solution to the problem was to implement a class called EmailQueueSender which is started at the beginning of execution and has a ScheduledExecutorService which looks at an internal queue every 500ms and if size()>0 it empties it.
While this is going on there's a synchronized static method called addEmailToQueue(String[]) which accepts an email containing body,subject..etc as an array. The system does work, and my other threads can move on after adding their email to queue without blocking or even worrying if the email was successfully just seems to be a little messy...or hackish...Every programmer gets this feeling in their stomach when they know they're doing something wrong or there's a better way. That said, can someone slap me on the wrist and suggest a more efficient way to accomplish this?
this class alone will probably handle most of the stuff you need.
just put the sending code in a runnable and add it with the execute method.
the getQueue method will allow you to retrieve the current list of waiting items so you can save it when restarting the sender service without losing emails
If you are using Java 6, then you can make heavy use of the primitives in the java.util.concurrent package.
Having a separate thread that handles the real sending is completely normal. Instead of polling a queue, I would rather use a BlockingQueue as you can use a blocking take() instead of busy-waiting.
If you are interested in whether the e-mail was successfully sent, your append method could return a Future so that you can pass the return value on once you have sent the message.
Instead of having an array of Strings, I would recommend creating a (almost trivial) Java class to hold the values. Object creation is cheap these days.
Im not sure if this would work for your application, but sounds like it would. A ThreadPoolExecutor (an ExecutorService-implementation) can take a BlockingQueue as argument, and you can simply add new threads to the queue. When you are done you simply terminate the ThreadPoolExecutor.
private BlockingQueue<Runnable> queue;
ThreadPoolExecutor executor = new ThreadPoolExecutor(10, 10, new Long(1000),
TimeUnit.MILLISECONDS, this.queue);
You can keep a count of all the threads added to the queue. When you think you are done (the queue is empty, perhaps?) simply compare this to
if (issuedThreads == pool.getCompletedTaskCount()) {
If the two match, you are done. Another way to terminate the pool is to wait a second in a loop:
try {
while (!this.pool.awaitTermination(1000, TimeUnit.MILLISECONDS));
} catch (InterruptedException e) {//log exception...}
There might be a full blown mail package out there already, but I would probably start with Spring's support for email and job scheduling. Fire a new job for each email to be sent, and let the timing of the executor send the jobs and worry about how many need to be done. No queuing involved.
Underneath the framework, Spring is using Java Mail for the email part, and lets you choose between ThreadPoolExecutor (as mention by #Lorenzo) or Quartz. Quartz is better in my opinion, because you can even set it up so that it fires your jobs at fixed points in time like cron jobs (eg. at midnight). The advantage of using Spring is that it greatly simplifies working with these packages, so that your job is even easier.
There are many packages and tools that will help with this, but the generic name for cases like this, extensively studied in computer science, is producer-consumer problem. There are various well-known solutions for it, which could be considered 'design patterns'.
