Monitoring the size of the Netty event loop queues

Monitoring the size of the Netty event loop queues - java

We've implemented monitoring for the Netty event loop queues in order to understand issues with some of our Netty modules.
The monitor uses the io.netty.util.concurrent.SingleThreadEventExecutor#pendingTasks method, which works for most modules, but for a module that handle a few thousand HTTP requests per second it seem to be hung, or very slow.
I now realize that the docs strictly specify this can be an issue, and I feel pretty lame... so I'm looking for another way to implement this monitor.
You can see the old code here:
https://github.com/outbrain/ob1k/blob/6364187b30cab5b79d64835131d9168c754f3c09/ob1k-core/src/main/java/com/outbrain/ob1k/common/metrics/NettyQueuesGaugeBuilder.java
public static void registerQueueGauges(final MetricFactory factory, final EventLoopGroup elg, final String componentName) {
int index = 0;
for (final EventExecutor eventExecutor : elg) {
if (eventExecutor instanceof SingleThreadEventExecutor) {
final SingleThreadEventExecutor singleExecutor = (SingleThreadEventExecutor) eventExecutor;
factory.registerGauge("EventLoopGroup-" + componentName, "EventLoop-" + index, new Gauge<Integer>() {
#Override
public Integer getValue() {
return singleExecutor.pendingTasks();
}
});
index++;
}
}
}
My question is, is there a better way to monitor the queue sizes?
This can be quite a useful metric, as it can be used to understand latency, and also to be used for applying back-pressure in some cases.

You'd probably need to track the changes as tasks as added and removed from the SingleThreadEventExecutor instances.
To do that you could create a class that wraps and/or extends SingleThreadEventExecutor. Then you'd have an java.util.concurrent.atomic.AtomicInteger that you'd call incrementAndGet() every time a new task is added and decrementAndGet() every time one is removed/finishes.
That AtomicInteger would then give you the current number of pending tasks. You could probably override pendingTasks() to use that value instead (though be careful there - I'm not 100% that wouldn't have side effects).
It would add a bit of overhead to every task being executed, but would make retrieving the number of pending tasks near constant speed.
The downside to this is of course that it's more invasive than what you are doing at the moment, as you'd need to configure your app to use different event executors.
NB. this is just a suggestion on how to work around the issue - I've not specifically done this with Netty. Though I've done this sort of thing with other code in the past.

Now, in 2021, Netty uses JCTools queues internally and pendingTasks() execution is very fast (almost always constant-time), so even than javadoc still declares that this operation is slow, you can use it without any concerns.
Previously the issue was that counting the elements in the queue was a linear operation, but after migration to JCTools library this problem disappeared.

Related

How do I make a block aware execution context?

For some reason I can't wrap my head around implementing this. I've got an application running with Play that calls out to Elastic Search. As part of my design, my service uses the Java API wrapped with scala future's as shown in this blog post. I've updated the code from that post to hint to the ExecutionContext that it will be doing some blocking I/O like so:
import scala.concurent.{blocking, Future, Promise}
import org.elasticsearch.action.{ActionRequestBuilder, ActionListener, ActionResponse }
def execute[RB <: ActionRequestBuilder[_, T, _, _]](request: RB): Future[T] = {
blocking {
request.execute(this)
promise.future
}
}
My actual service that constructs the queries to send to ES takes an executionContext as a constructor parameter that it then uses for calls to elastic search. I did this so that the global execution context that play uses won't have it's threads tied down by the blocking calls to ES. This S.O. comment mentions that only the global context is blocking aware, so that leaves me to have to create my own. In that same post/answer there's a lot of information about using a ForkJoin pool, but I'm not sure how to take what's written in those docs and combine it with the hints in the blocking documentation to create an execution context that responds to blocking hints.
I think one of the issues I have is that I'm not sure exactly how to respond to the blocking context in the first place? I was reading the best practices and the example it uses is an unbounded cache of threads:
Note that here I prefer to use an unbounded "cached thread-pool", so it doesn't have a limit. When doing blocking I/O the idea is that you've got to have enough threads that you can block. But if unbounded is too much, depending on use-case, you can later fine-tune it, the idea with this sample being that you get the ball rolling.
So does this mean that with my ForkJoin backed thread pool, that I should try to use a cached thread when dealing with non-blocking I/O and create a new thread for blocking IO? Or something else? Pretty much every resource I find online about using seperate thread pools tends to do what the Neophytes guide does, which is to say:
How to tune your various thread pools is highly dependent on your individual application and beyond the scope of this article.
I know it depends on your application, but in this case if I just want to create some type of blocking aware ExecutionContext and understand a decent strategy for managing the threads. If the Context is specifically for a single part of the application, should I just make a fixed thread pool size and not use/ignore the blocking keyword in the first place?
I tend to ramble, so I'll try to break down what I'm looking for in an answer:
Code! Reading all these docs still leave me like I'm feeling just out of reach of being able to code a blocking-aware context, and I'd really appreciate an example.
Any links or tips on how to handle blocking threads, i.e. make a new thread for them endlessly, check the number of threads available and reject if too many, some other strategy
I'm not looking for performance tips here, I know I'll only get that with testing, but I can't test if I can't figure out how to code the context's in the first place! I did find an example of ForkJoins vs threadpools but I'm missing the crucial part about blocking there.
Sorry for the long question here, I'm just trying to give you a sense of what I'm looking at and that I have been trying to wrap my head around this for over a day and need some outside help.
Edit: Just to make this clear, the ElasticSearch Service's constructor signature is:
//Note that these are not implicit parameters!
class ElasticSearchService(otherParams ..., val executionContext: ExecutionContext)
And in my application start up code I have something like this:
object Global extends GlobalSettings {
val elasticSearchContext = //Custom Context goes here
...
val elasticSearchService = new ElasticSearchService(params, elasticSearchContext);
...
}
I am also reading through Play's recommendations for contexts, but have yet to see anything about blocking hints yet and I suspect I might have to go look into the source to see if they extend the BlockContext trait.

So I dug into the documentation and Play's best practices for the situation I'm dealing with is to
In certain circumstances, you may wish to dispatch work to other thread pools. This may include CPU heavy work, or IO work, such as database access. To do this, you should first create a thread pool, this can be done easily in Scala:
And provides some code:
object Contexts {
implicit val myExecutionContext: ExecutionContext = Akka.system.dispatchers.lookup("my-context")
}
The context is from Akka, so I ran down there searching for the defaults and types of Contexts they offer, which eventually led me to the documentation on dispatchers. The default is a ForkJoinPool whose default method for managing a block is to call the managedBlock(blocker). This led me to reading the documentation that stated:
Blocks in accord with the given blocker. If the current thread is a ForkJoinWorkerThread, this method possibly arranges for a spare thread to be activated if necessary to ensure sufficient parallelism while the current thread is blocked.
So it seems like if I have a ForkJoinWorkerThread then the behavior I think I want will take place. Looking at the source of ForkJoinPool some more I noted that the default thread factory is:
val defaultForkJoinWorkerThreadFactory: ForkJoinWorkerThreadFactory = juc.ForkJoinPool.defaultForkJoinWorkerThreadFactory
Which implies to me that if I use the defaults in Akka, that I'll get a context which handles blocking in the way I expect.
So reading the Akka documentation again it would seem that specifying my context something like this:
my-context {
type = Dispatcher
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 8
parallelism-factor = 3.0
parallelism-max = 64
task-peeking-mode = "FIFO"
}
throughput = 100
}
would be what I want.
While I was searching in the source code I did some looking for uses of blocking or of calling managedBlock and found an example of overriding the ForkJoin behavior in ThreadPoolBuilder
private[akka] class AkkaForkJoinWorkerThread(_pool: ForkJoinPool) extends ForkJoinWorkerThread(_pool) with BlockContext {
override def blockOn[T](thunk: ⇒ T)(implicit permission: CanAwait): T = {
val result = new AtomicReference[Option[T]](None)
ForkJoinPool.managedBlock(new ForkJoinPool.ManagedBlocker {
def block(): Boolean = {
result.set(Some(thunk))
true
}
def isReleasable = result.get.isDefined
})
result.get.get // Exception intended if None
}
}
Which seems like what I originally asked for as an example of how to make something that implements the BlockContext. That file also has code showing how to make an ExecutorServiceFactory, which is what I believe
is reference by the executor part of the configuration. So I think what I would do if I wanted to have
a totally custom context would be extend some type of WorkerThread and write my own ExecutorServiceFactory that uses the custom workerthread and then specify the fully qualified class name in the property like this post advises.
I'm probably going to go with using Akka's forkjoin :)

"Asynchronous while loop" in JavaFX thread

When I need to do an indeterminate number of pieces of work in the JavaFX thread without blocking the user interface, I use this class
public class AsyncWhile {
private final IntPredicate hook;
private int schedCount = 0;
private boolean terminated = false;
private int callCount = 0;
private static final int schedN = 1;
public AsyncWhile(IntPredicate hook) {
this.hook = hook;
schedule();
}
public void kill(){
terminated = true;
}
private void schedule(){
while(schedCount < schedN){
Platform.runLater(this::poll);
schedCount++;
}
}
private void poll(){
schedCount--;
if(!terminated){
terminated = !hook.test(callCount++);
if(!terminated){
schedule();
}
}
}
}
like this
asyncWhile = new AsyncWhile(i -> {
// return false when you're done
// or true if you want to be called again
});
// can asyncWhile.kill() should we need to
(
If you need a more concrete example, here I'm reading one line at a time from an InputStream and then parsing and displaying a plot parsed from that line:
asyncWhile = new AsyncWhile(i -> {
String line;
try {
if((line = reader.readLine()).startsWith(" Search complete.")){ // it so happens that this reader must be read in the JavaFX thread, because it automatically updates a console window
return false;
} else {
Task<MatchPlot> task = new ParsePlotTask(line);
task.setOnSucceeded(wse -> {
plotConsumer.accept(task.getValue());
// todo update progress bar
});
executorService.submit(task);
return true;
}
} catch (IOException ex) {
new ExceptionDialog(ex).showAndWait();
return false;
}
});
)
Chaining up runLaters like that feels like a hack. What is the proper way to solve this kind of problem? (By "this kind of problem" I mean the problem that would have been solved by a simple while loop, had it not been for the fact that its contents must run in the JavaFX thread without making the UI unresponsive.)

Recommended
In general, basing a solution off of the PartialResultsTask sample from the Task documentation (which relies on Platform.runLater invocations), is the standard way of solving this problem.
Alternate
Rather than scheduling runLater's you could use a BlockingDeque. In your processing task, you perform your time-consuming process just with a normal while loop, generate non-UI model objects which need to be represented in the JavaFX UI, stick those non-UI model objects into your queue. Then you setup a Timeline or AnimationTimer that polls the queue, draining it as necessary and to pick the items off the queue and represent them in the UI.
This approach is similar (but a bit different) to: Most efficient way to log messages to JavaFX TextArea via threads with simple custom logging frameworks.
Using your own queue in this case is not much different from using the implicit queue runLater invocations go on to, though, with your own queue, you might have a little more control over the process if you need that. It's a trade-off though, as it adds a bit more custom code and complexity, so probably just use the recommended PartialResults sample from Task and, if that doesn't fit your needs, then perhaps investigate the alternative custom queue based approach.
Aside
As a side note, you could use the custom logging framework linked earlier to log console messages from multiple threads to be displayed in your UI. That way you don't need to have your reader.readLine call execute I/O on the JavaFX UI, which is not recommended. Instead, have the I/O performed off the JavaFX UI thread and, as you process items, call into the logging framework to log messages that will eventually show up on the UI (the internal mechanisms within the logging framework take care of ensuring that JavaFX threading rules are respected).
Can you see any danger in using my approach?
Sorry for being non-specific here. I'm not going to directly answer this, but tangentially and not always applicably to your approach, using runLater can cause issues, mostly it is not a concern, but some things to consider:
If you send enough runLater calls faster than they can be processed, eventually you will either run out of memory or some runLater calls will start being ignored (depending on how the runLater system works).
Calls to runLater are sequential, not prioritized, so if there are internal events which are also being runLater, such as handling UI events, those might be delayed while your runLater calls are being processed.
runLater offers no guarantee of when later is. If your work is time sensitive, that might be an issue or at least something you need to account for in your implementation.
The runLater system is likely internally fairly complex and you won't know exactly how it is implemented unless you study the source code pretty closely.
Anything that you run on runLater is going to hold up the JavaFX application thread, probably until all of the outstanding runLater calls are complete
Once you have issued a bunch of runLater calls, you can't easily intersperse their processing over multiple pulses in the JavaFX animation system, they will likely all be executed on the next pulse. So you have to be careful not to send too many calls at once.
Those are just some things that come to mind.
In general though, runLater is a sound mechanism for many tasks and a core part of the JavaFX architecture. For most things the above considerations don't really have any consequence.
Writing quality multi-threaded code is pretty tricky. To the point where it often best avoided where possible, which is what the JavaFX system attempts to do for the most part by making scene graph access single-threaded. If you must do it, then stick to the patterns outlined in the Task documentation or utilizing some of the high level java.util.concurrent systems as much as possible rather than implementing your own systems. Also note that reading multi-threaded code is even trickier than writing it, so make sure what you do is clear to the next person.

How can I run a background thread that cleans up some elements in list regularly?

I am currently implementing cache. I have completed basic implementation, like below. What I want to do is to run a thread that will remove entry that satisfy certain conditions.
class Cache {
int timeLimit = 10; //how long each entry needs to be kept after accessed(marked)
int maxEntries = 10; //maximum number of Entries
HashSet<String> set = new HashSet<String>();
public void add(Entry t){
....
}
public Entry access(String key){
//mark Entry that it has been used
//Since it has been marked, background thread should remove this entry after timeLimit seconds.
return set.get(key);
}
....
}
My question is, how should I implement background thread so that the thread will go around the entries in set and remove the ones that has been marked && (last access time - now)>timeLimit ?
edit
Above is just simplified version of codes, that I did not write synchronized statements.

Why are you reinventing the wheel? EhCache (and any decent cache implementation) will do this for you. Also much more lightweight MapMaker Cache from Guava can automatically remove old entries.
If you really want to implement this yourself, it is not really that simple.
Remember about synchronization. You should use ConcurrentHashMap or synchronized keyword to store entries. This might be really tricky.
You must store last access time somehow of each entry somehow. Every time you access an entry, you must update that timestamp.
Think about eviction policy. If there are more than maxEntries in your cache, which ones to remove first?
Do you really need a background thread?
This is surprising, but EhCache (enterprise ready and proven) does not use background thread to invalidate old entries). Instead it waits until the map is full and removes entries lazily. This looks like a good trade-off as threads are expensive.
If you have a background thread, should there be one per cache or one global? Do you start a new thread while creating a new cache or have a global list of all caches? This is harder than you think...
Once you answer all these questions, the implementation is fairly simple: go through all the entries every second or so and if the condition you've already written is met, remove the entry.

I'd use Guava's Cache type for this, personally. It's already thread-safe and has methods built in for eviction from the cache based on some time limit. If you want a thread to periodically sweep it, you can just do something like this:
new Thread(new Runnable() {
public void run() {
cache.cleanUp();
try { Thread.sleep(MY_SLEEP_DURATION); } catch (Exception e) {};
}
}).start();

I don't imagine you really need a background thread. Instead you can just remove expired entries before or after you perform a lookup. This simplifies the entire implementation and its very hard to tell the difference.
BTW: If you use a LinkedHashMap, you can use it as a LRU cache by overriding removeEldestEntry (see its javadocs for an example)

First of all, your presented code is incomplete because there is no get(key) on HashSet (so I assume you mean some kind of Map instead) and your code does not mention any "marking." There are also many ways to do caching, and it is difficult to pick out the best solution without knowing what you are trying to cache and why.
When implementing a cache, it is usually assumed that the data-structure will be accessed concurrently by multiple threads. So the first thing you will need to do, is to make use of a backing data-structure that is thread-safe. HashMap is not thread-safe, but ConcurrentHashMap is. There are also a number of other concurrent Map implementations out there, namely in Guava, Javolution and high-scale lib. There are other ways to build caches besides maps, and their usefulness depends on your use case. Regardless, you will most likely need to make the backing data-structure thread-safe, even if you decide you don't need the background thread and instead evict expired objects upon attempting to retrieve them from the cache. Or letting the GC remove the entries by using SoftReferences.
Once you have made the internals of your cache thread-safe, you can simply fire up a new (most likely daemonized) thread that periodically sweeps/iterates the cache and removes old entries. The thread would do this in a loop (until interrupted, if you want to be able to stop it again) and then sleep for some amount of time after each sweep.
However, you should consider whether it is worth it for you, to build your own cache implementation. Writing thread-safe code is not easy, and I recommend that you study it before endeavouring to write your own cache implementation. I can recommend the book Java Concurrency in Practice.
The easier way to go about this is, of course, to use an existing cache implementation. There are many options available in Java-land, all with their own unique set of trade-offs.
EhCache and JCS are both general purpose caches that fit most caching needs one would find in a typical "enterprise" application.
Infinispan is a cache that is optimised for distributed use, and can thus cache more data than what can fit on a single machine. I also like its ConcurrentMap based API.
As others have mentioned, Googles Guava library has a Cache API, which is quite useful for smallish in-memory caches.
Since you want to limit the number of entries in the cache, you might be interested in an object-pool instead of a cache.
Apache Commons-Pool is widely used, and has APIs that resemble what you are trying to build yourself.
Stormpot, on the other hand, has a rather different API, and I am pretty much only mentioning it because I wrote it. It's probably not what you want, but who can be sure without knowing what you are trying to cache and why?

First, make access to your collection either synchronized or use ConcurrentHashSet a ConcurrentHashMap based Set as indicated in the comments below.
Second, write your new thread, and implement it as an endless loop that periodically iterates the prior collection and removes the elements. You should write this class in a way that it is initialized with the correct collection in the constructor, so that you do not have to worry about "how do I access the proper collection".

Java: TaskExecutor for Asynchronous Database Writes?

I'm thinking of using Java's TaskExecutor to fire off asynchronous database writes. Understandably threads don't come for free, but assuming I'm using a fixed threadpool size of say 5-10, how is this a bad idea?
Our application reads from a very large file using a buffer and flushes this information to a database after performing some data manipulation. Using asynchronous writes seems ideal here so that we can continue working on the file. What am I missing? Why doesn't every application use asynchronous writes?

Why doesn't every application use asynchronous writes?
It's often necessary/usefull/easier to deal with a write failure in a synchronous manner.

I'm not sure a threadpool is even necessary. I would consider using a dedicated databaseWriter thread which does all writing and error handling for you. Something like:
public class AsyncDatabaseWriter implements Runnable {
private LinkedBlockingQueue<Data> queue = ....
private volatile boolean terminate = false;
public void run() {
while(!terminate) {
Data data = queue.take();
// write to database
}
}
public void ScheduleWrite(Data data) {
queue.add(data);
}
}
I personally fancy the style of using a Proxy for threading out operations which might take a long time. I'm not saying this approach is better than using executors in any way, just adding it as an alternative.

Idea is not bad at all. Actually I just tried it yesterday because I needed to create a copy of online database which has 5 different categories with like 60000 items each.
By moving parse/save operation of each category into the parallel tasks and partitioning each category import into smaller batches run in parallel I reduced the total import time from several hours (estimated) to 26 minutes. Along the way I found good piece of code for splitting the collection: http://www.vogella.de/articles/JavaAlgorithmsPartitionCollection/article.html
I used ThreadPoolTaskExecutor to run tasks. Your tasks are just simple implementation of Callable interface.

why doesn't every application use asynchronous writes? - erm because every application does a different thing.
can you believe some applications don't even use a database OMG!!!!!!!!!
seriously though, given as you don't say what your failure strategies are - sounds like it could be reasonable. What happens if the write fails? or the db does away somehow
some databases - like sybase - have (or at least had) a thing where they really don't like multiple writers to a single table - all the writers ended up blocking each other - so maybe it wont actually make much difference...

What design pattern to use for a threaded queue

I have a very complex system (100+ threads) which need to send email without blocking. My solution to the problem was to implement a class called EmailQueueSender which is started at the beginning of execution and has a ScheduledExecutorService which looks at an internal queue every 500ms and if size()>0 it empties it.
While this is going on there's a synchronized static method called addEmailToQueue(String[]) which accepts an email containing body,subject..etc as an array. The system does work, and my other threads can move on after adding their email to queue without blocking or even worrying if the email was successfully sent...it just seems to be a little messy...or hackish...Every programmer gets this feeling in their stomach when they know they're doing something wrong or there's a better way. That said, can someone slap me on the wrist and suggest a more efficient way to accomplish this?
Thanks!

http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
this class alone will probably handle most of the stuff you need.
just put the sending code in a runnable and add it with the execute method.
the getQueue method will allow you to retrieve the current list of waiting items so you can save it when restarting the sender service without losing emails

If you are using Java 6, then you can make heavy use of the primitives in the java.util.concurrent package.
Having a separate thread that handles the real sending is completely normal. Instead of polling a queue, I would rather use a BlockingQueue as you can use a blocking take() instead of busy-waiting.
If you are interested in whether the e-mail was successfully sent, your append method could return a Future so that you can pass the return value on once you have sent the message.
Instead of having an array of Strings, I would recommend creating a (almost trivial) Java class to hold the values. Object creation is cheap these days.

Im not sure if this would work for your application, but sounds like it would. A ThreadPoolExecutor (an ExecutorService-implementation) can take a BlockingQueue as argument, and you can simply add new threads to the queue. When you are done you simply terminate the ThreadPoolExecutor.
private BlockingQueue<Runnable> queue;
...
ThreadPoolExecutor executor = new ThreadPoolExecutor(10, 10, new Long(1000),
TimeUnit.MILLISECONDS, this.queue);
You can keep a count of all the threads added to the queue. When you think you are done (the queue is empty, perhaps?) simply compare this to
if (issuedThreads == pool.getCompletedTaskCount()) {
pool.shutdown();
}
If the two match, you are done. Another way to terminate the pool is to wait a second in a loop:
try {
while (!this.pool.awaitTermination(1000, TimeUnit.MILLISECONDS));
} catch (InterruptedException e) {//log exception...}

There might be a full blown mail package out there already, but I would probably start with Spring's support for email and job scheduling. Fire a new job for each email to be sent, and let the timing of the executor send the jobs and worry about how many need to be done. No queuing involved.
Underneath the framework, Spring is using Java Mail for the email part, and lets you choose between ThreadPoolExecutor (as mention by #Lorenzo) or Quartz. Quartz is better in my opinion, because you can even set it up so that it fires your jobs at fixed points in time like cron jobs (eg. at midnight). The advantage of using Spring is that it greatly simplifies working with these packages, so that your job is even easier.

There are many packages and tools that will help with this, but the generic name for cases like this, extensively studied in computer science, is producer-consumer problem. There are various well-known solutions for it, which could be considered 'design patterns'.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.