Sending Async HTTP requests in Java using Spring Boot - java

I am working on an application which needs to test 1000's of proxy servers continuously. The application is based around Spring Boot.
The current approach I am using is #Async decorated method which takes a proxy server and returns the result.
I am often getting OutOfMemory error and the processing is very slow. I assume that is because each async method is executed in a separate thread which blocks on I/O?
Everywhere I read about async in Java, people mix parallel execution in threads with non-blocking IO. In the Python world, there is the async library which executes I/O requests in a single thread. While a method is waiting for a response from server, it starts executing other method.
I think in my case, I need something like this because Spring's #Async is not suitable for me. Can someone please help remove my confusion and suggest me how should I go about this challenge?
I want to check 100's of proxies simultaneously without putting excessive load.
I have read about Apache Async HTTP Client but I don't know if it is suitable?
This is the thread pool configuration I am using:
public Executor proxyTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(Runtime.getRuntime().availableProcessors() * 2 - 1);
executor.setMaxPoolSize(100);
executor.setDaemon(true);
return executor;
}

I am often getting OutOfMemory error and the processing is very slow.
I assume that is because each async method is executed in a separate
thread which blocks on I/O?
For the OOME, I explain it in the second point.
About the slowness, it is indeed related to I/O performed in the request/response processings.
The problem comes from the number of thread running effectively in parallel.
With your actual configuration, the number of pool max is never reached (I explain why below).
Supposing that corePoolSize==10 in your case. It means that 10 threads run in parallel. Suppose each thread runs about 3 seconds to test the site.
It means that you test a site in about 0.3 second. To test 1000 sites it makes 300 seconds.
It is slow enough and an important part of the time is waiting time : I/O to send/receive request/response from the site currently tested.
To increase the overall speed, you should probably run in parallel initially much more threads than your core capacity. In this way, I/O waiting time will be less a problem since the scheduling between the threads will be frequent and so you would have some I/O processings without value for the threads while these are paused.
It should handle the OOME issue and probably improve strongly the execution time, but well no guarantee that you get a very short time.
To achieve it you should probably work the multi-threading logic more finely and rely on API/libraries with non blocking IO.
Some information of the official documentation that should be helpful.
This part explains the overall logical when a task is submitted (emphasis is mine):
The configuration of the thread pool should also be considered in
light of the executor’s queue capacity. For the full description of
the relationship between pool size and queue capacity, see the
documentation for ThreadPoolExecutor. The main idea is that, when a
task is submitted, the executor first tries to use a free thread if
the number of active threads is currently less than the core size. If
the core size has been reached, the task is added to the queue, as
long as its capacity has not yet been reached. Only then, if the
queue’s capacity has been reached, does the executor create a new
thread beyond the core size. If the max size has also been reached,
then the executor rejects the task.
And this explains the consequences on the queue size (emphasis is still mine):
By default, the queue is unbounded, but this is rarely the desired
configuration, because it can lead to OutOfMemoryErrors if enough
tasks are added to that queue while all pool threads are busy.
Furthermore, if the queue is unbounded, the max size has no effect at
all. Since the executor always tries the queue before creating a new
thread beyond the core size, a queue must have a finite capacity for
the thread pool to grow beyond the core size (this is why a fixed-size
pool is the only sensible case when using an unbounded queue).
Long story short : you didn't set the queue size that by default is unbounded (Integer.MAX_VALUE). So you fill the queue with several hundreds of tasks that will be pop only much later. These tasks use much memory, whereas the OOME risen.
Besides, as explained in the documentation, this setting is helpless with an unbounded queue because only when the queue is full a new thread would be created :
executor.setMaxPoolSize(100);
Setting both information with relevant values make more sense :
public Executor proxyTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(Runtime.getRuntime().availableProcessors() * 2 - 1);
executor.setMaxPoolSize(100);
executor.setQueueCapacity(100);
executor.setDaemon(true);
return executor;
}
Or as alternative use a fixed-size pool with the same value for initial and max pool size :
Rather than only a single size, an executor’s thread pool can have
different values for the core and the max size. If you provide a
single value, the executor has a fixed-size thread pool (the core and
max sizes are the same).
public Executor proxyTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(100);
executor.setMaxPoolSize(100);
executor.setDaemon(true);
return executor;
}
Note also that invoking 1000 times the asynch service without pause seems harmful in terms of memory since it cannot handle them straightly. You should probably split these invocations into smaller parts (2, 3 or more) by performing thread.sleep() between them.

Related

Why can't the core threads of a thread pool in Java be reused in the initial phase?

I recently had a question looking at the source code of ThreadPoolExecutor: If thread pool represents the reuse of existing threads to reduce the overhead of thread creation or destruction, why not reuse core threads in the initial phase? That is, when the current number of threads is less than the number of core threads, first check whether there are core threads that have completed the task, if so, reuse. Why not? Instead of creating a new thread before the number of core threads is reached, does this violate thread pool design principles?
The following is a partial comment on the addWorker() method in ThreadPoolExecutor
#param firstTask the task the new thread should run first (or null if none). Workers are created with an initial first task (in method execute()) to bypass queuing when there are fewer than corePoolSize threads (in which case we always start one), or when the queue is full (in which case we must bypass queue). Initially idle threads are usually created via prestartCoreThread or to replace other dying workers.
This was actually requested already: JDK-6452337. A core libraries developer has noted:
I like this idea, but ThreadPoolExecutor is already complicated enough.
Keep in mind that corePoolSize is an essential part of ThreadPoolExecutor and is saying how many workers are always active/idle at least. Reaching this number just naturally takes a very short time. You set corePoolSize according to your needs and it's expected that the workload will meet this number.
My assumption is that optimizing this "warm-up phase" – taking it for granted that this will actually increase efficiency – is not worth it. I can't quantify for you what additional complexity this optimization will bring, I'm not developing Java Core libraries, but I assume that it's not worth it.
You can think of it like that: The "warm-up phase" is constant while the thread pool will run for an undefined amount of time. In an ideal world, the initial phase actually should take no time at all, the workload should be there as you create the thread pool. So you are thinking about an optimization that optimizes something that is not the expected thread pool state.
The thread workers will have to be created at some point anyways. This optimization only delays the creation. Imagine you have a corePoolSize of 10, so there is the overhead of creating 10 threads at least. This overhead won't change if you do it later. Yes, resources are also taken later but here I'm asking if the thread pool is configured correctly in the first place: Is corePoolSize correct, does it meet the current workload?
Notice that ThreadPoolExecutor has methods like setCorePoolSize(int) and allowCoreThreadTimeOut(boolean) and more that allow you to configure the thread pool according to your needs.

ThreadPoolExecutor javadoc, three strategies for queuing and lockups

In Oracle documentation for ThreadPoolExecutor class it is written:
There are three general strategies for queuing:
Direct handoffs. A good default choice for a work queue is a SynchronousQueue that hands
off tasks to threads without otherwise holding them. Here, an attempt
to queue a task will fail if no threads are immediately available to
run it, so a new thread will be constructed. This policy avoids
lockups when handling sets of requests that might have internal
dependencies. Direct handoffs generally require unbounded
maximumPoolSizes to avoid rejection of new submitted tasks. This in
turn admits the possibility of unbounded thread growth when commands
continue to arrive on average faster than they can be processed.
Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new
tasks to wait in the queue when all corePoolSize threads are busy.
Thus, no more than corePoolSize threads will ever be created. (And the
value of the maximumPoolSize therefore doesn't have any effect.) This
may be appropriate when each task is completely independent of others,
so tasks cannot affect each others execution; for example, in a web
page server. While this style of queuing can be useful in smoothing
out transient bursts of requests, it admits the possibility of
unbounded work queue growth when commands continue to arrive on
average faster than they can be processed.
...
Why direct handoff strategy is better at avoiding lockups in comparison to unbounded queues strategy? Or do I understand it incorrectly?
Let's say you have a corePoolSize = 1. If the first task submits another task to the same pool and wait for the results it will lock up indefinitely.
However if a task is completley independent there would be no reason to use direct handoff in regards to preventing lockups.
This is just an example, internal dependency can mean a lot of different things.

RejectedExecutionHandler - CallerRunsPolicy vs AbortPolicy

While setting up thread pool configuration, how do you choose the correct RejectedExecutionHandler?
I have a legacy application which publishes events (those events could be consumed locally or could be consumed by the remote process). At the moment, the policy is to abort which causes lots of exceptions and missed events. We pass synchronous queue to thread pool executor.
I was thinking of changing the RejectedExecutionHandler to caller runs policy. This could mean that caller spending time running that task when the thread bound and queue capacity is reached. I don't see any problem with that.
What has been your experience so far? Also, Is using unbounded queue means no utility for RejectedExecutionHandler?
I think you are already familiar with different RejectedExecutionHandlers of ThreadPoolExecutor
In ThreadPoolExecutor.CallerRunsPolicy, the thread that invokes execute itself runs the task. This provides a simple feedback control mechanism that will slow down the rate that new tasks are submitted.
It will impact overall performance of your application. If your application can afford this delay (Not Real time and Batch Processing, Non-Interactive and Offline), you can use this policy. If you can't afford delay and fine with discarding that task, you can go for ThreadPoolExecutor.DiscardPolicy
Is using unbounded queue means no utility for RejectedExecutionHandler?
Yes. Unbounded queue means no utility for RejectedExecutionHandler. When you are using unbounded queue, make sure that your application throughput is under control with respect to Memory and CPU utilization. If you are submitting short duration tasks with less memory footprint of data in that task, you can use unbounded queue.

How to identify the right java Executor?

We need to do some asynchronous task processing where in around 30-40 requests will be coming at the same moment and each request will intiate a asynch task which will approximately take around 7-8 seconds to complete.
If java executorservice has been identified to do such task, what would be the idle type of executor for such purpose?
I thought of using CachedThreadPool but my worry is if too many threads are created would it have any performance impact on the application?
Another option would be to use FixedThreadPool but I am struggling to think of a idle no threads which it should be instantiated with...
What is the recommended Executor for such a scenario or how we go about finding the right one?
I think you are limiting your research to just the Executors.* factory methods. You should review what the range of constructors of ThreadPoolExecutor, you'll find a maximum thread pool size limit, among other things.
I thought of using CachedThreadPool but my worry is if too many
threads are created would it have any performance impact on the
application?
You need to test for the application for performance impact.
If none of them fits into the application or having some issues then you can use customized thread pool executor java.util.concurrent.ThreadPoolExecutor
You can customize according your needs with configuiring core pool size, configuring the blocking queue. Blocking queue will be used and task will be queued when pool size is reached.

ThreadPoolExecutor on java doc

at http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html
you can read the description of parameters to constructor.
Specifically in the "Core and maximum pool sizes" paragraph, it's written:
If there are more than corePoolSize but less than maximumPoolSize threads running, a new thread will be created only if the queue is full.
...
By setting maximumPoolSize to an essentially unbounded value such as Integer.MAX_VALUE, you allow the pool to accommodate an arbitrary number of concurrent tasks.
Now I can't understand what "only if the queue is full" in the first part stands for...
Will ThreadPoolExecutor wait until queue is full or it will simply create a new worker?
An suppose now that we have more tasks that aren't asynchronous between them: using a ThreadPoolExecutor could cause a deadlock? Supposing that my first 10 tasks are producer and that CorePoolSize is 10, then succeeding consumer tasks will go to queue and won't run until the queue is full? If so this behavior may cause deadlock because first 10 producers could go on wait, suspending all 10 threads of the Core.
When the queue is full?
I'm not sure I understood well the documentation, because Executors.newCachedThreadPool() seems to create a new Worker until maxPoolSize is reached and THEN it sends task to queue.
I'm a little confused.
Thank you
When you construct the ThreadPoolExecutor, you pass in an instance of BlockingQueue<Runnable> called workQueue, to hold the tasks, and it is this queue that is being referred to.
In fact, the section of the docs called "Queuing" goes into more detail about the phrase you're confused about:
Any BlockingQueue may be used to transfer and hold submitted tasks. The use of this queue interacts with pool sizing:
If fewer than corePoolSize threads are running, the Executor always prefers adding a new thread rather than queuing.
If corePoolSize or more threads are running, the Executor always prefers queuing a request rather than adding a new thread.
If a request cannot be queued, a new thread is created unless this would exceed maximumPoolSize, in which case, the task will be rejected.
As for your second part, about inter-task dependencies - in this case I don't think it's a good idea to put them into an ExecutorService at all. The ExecutorService is good for running a self-contained bit of code at some point in the future, but by design it's not meant to be strongly deterministic about when this happens, other than "at some convenient point in the (hopefully near) future, after tasks that were previously queued have started."
Combine this lack of precision of timing, with the hard ordering requirements that concurrent operation imposes, and you can see that having a producer and a consumer that need to talk to each other, put into a general purpose ExecutorService, is a recipe for very annoying and confusing bugs.
Yes, I'm sure you could get it to work with sufficient tweaking of parameters. However, it wouldn't be clear why it worked, it wouldn't be clear what the dependencies were, and when (not if) it broke, it would be very hard to diagnose. (Harder than normal concurrency problems, I suspect). The bottom line is that an ExecutorService isn't designed to run Runnables with hard timing restrictions, so this could even be broken by a new release of Java, because it doesn't have to work like this.
I think you're asking the wrong question, by looking at the details when perhaps your concepts are a little shaky. Perhaps if you explained what you wanted to achieve there would be a better way to go about it.
To clarify your first quote - if the executor already has corePoolSize threads running, and all of those threads are busy when a new task is submitted, it will not create any more threads, but will enqueue the task until one of the threads becomes free. It will only create a new thread (up to the maxPoolSize limit) when the queue becomes full. If the queue is huge (i.e. bounded only by memory constraints) then no more than corePoolSize threads will ever be created.
Executors.newCachedThreadPool() will create an executor with a zero-sized queue, so the queue is always full. This means that it will create a new thread (or re-use an idle one) as soon as you submit the task. As such, it's not a good demonstration of what the core/max/queue parameters are for.

Categories