Parallel Flux vs Flux in project Reactor

Parallel Flux vs Flux in project Reactor - java

So what I have understood from the docs is that parallel Flux is that essentially divided the flux elements into separate rails.(Essentially something like grouping). And as far as thread is considered, it would be the job of schedulers. So let's consider a situation like this. And all this will be run on the same scheduler instance provided via runOn() methods.
Let's consider a situation like below:
Mono<Response> = webClientCallAPi(..) //function returning Mono from webclient call
Now let's say we make around 100 calls
Flux.range(0,100).subscribeOn(Schedulers.boundedElastic()).flatMap(i -> webClientCallApi(i)).collecttoList() // or subscribe somehow
and if we use paralleFlux:
Flux.range(0,100).parallel().runOn(Schedulers.boundedElastic()).flatMap(i -> webClientCallApi(i)).sequential().collecttoList();
So if my understanding is correct, it pretty much seems to be similar. So what are the advantages of ParallelFlux over Flux and when should you use parallelFlux over flux?

In practice, you'll likely very rarely need to use a parallel flux, including in this example.
In your example, you're firing off 100 web service calls. Bear in mind the actual work needed to do this is very low - you generate and fire off an asynchronous request, and then some time later you receive a response back. In between that request & response you're not doing any work at all, it simply takes a tiny amount of CPU resources when each request is sent, and another tiny about when each response is received. (This is one of the core advantages of using an asynchronous framework to make your web requests, you're not tying up any threads while the request is in-flight.)
If you split this flux and run it in parallel, you're saying that you want these tiny amounts of CPU resources to be split so they can run simultaneously, on different CPU cores. This makes absolutely no sense - the overhead of splitting the flux, running it in parallel and then combining it later is going to be much, much greater than just leaving it to execute on a normal, sequential scheduler.
On the other hand, let's say I had a Flux<Integer> and I wanted to check if each of those integers was a prime for example - or perhaps a Flux<String> of passwords that I wanted to check against a BCrypt hash. Those sorts of operations are genuinely CPU intensive, so in that case a parallel flux, used to split execution across cores, could make a lot of sense. In reality though, those situations occur quite rarely in the normal reactor use cases.
(Also, just as a closing note, you almost always want to use Schedulers.parallel() with a parallel flux, not Schedulers.boundedElastic().)

Related

Spring Webflux Threading Model on machine with ONE cpu

Small question regarding Spring Webflux and project Reactor please.
From the official doc, we can see:
https://docs.spring.io/spring-framework/docs/current/reference/html/web-reactive.html#threading-model
Threading Model
What threads should you expect to see on a server running with Spring WebFlux?
On a “vanilla” Spring WebFlux server (for example, no data access nor other optional dependencies), you can expect one thread for the server and several others for request processing (typically as many as the number of CPU cores). Servlet containers, however, may start with more threads (for example, 10 on Tomcat), in support of both servlet (blocking) I/O and servlet 3.1 (non-blocking) I/O usage.
What happens if the hardware only has one cpu please?
I have a webapp, which takes a Flux of string as input, and perform a heavy operation on it.
Please note, the heavy operation is non blocking. It has been BlockHound tested, and for sure, does not contain any database, web call IO.
Yet, the computation is heavy, lengthy (but again, non blocking).
What heavyComputation does is that it takes the string, performs some in memory decryption, convert to some objects, check some fields against a BCrypt hash, in memory re encyption.
The heavyComputation is very heavy and takes up to 5 second to complete for one string.
#GetMapping(value = "/upload-flux", consumes = MediaType.MULTIPART_FORM_DATA_VALUE, produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> question(#RequestPart("question") Flux<String> stringFlux) {
return stringFlux.map(oneString -> heavyComputation(oneString));
}
private String heavyComputation(String oneString) {
// heavy and time-consuming in memory decryption
// heavy and time-consuming conversion to java object
// heavy and time-consuming validation of fields against hash
// heavy and time-consuming re encryption
// return the re encrypted string
}
I was hoping by using Spring Webflux, I could see some concurrency and asyncs, as the hardware are constrains to 1 cpu only.
Sadly, I observe everything is done on reactor-http-nio-1 thread, and looks like it is fairly sequential, one first string, heavyComputation which takes 5ish seconds, then the second string, etc.
What am I doing wrong please?
Thank you

WebFlux is built around the concept that new incoming requests don't spawn new threads (*) like traditional web servers such as servlet containers do. Instead, requests get queued to be processed by a single long-running thread (assuming a single CPU core), similar to how e.g. click events are processed in JavaScript or desktop UI libraries. The benefit of that is that the CPU is freed from much of the overhead associated with managing threads, which is very expensive. It gets to do one job after another instead of creating the illusion that it can do thousands of jobs at once.
This doesn't magically make your CPU go faster, it just makes it waste less time with thread context switching, which is notoriously expensive. Your CPU-bound computations need as many CPU cycles as they need, no matter what thread they run on. Also, with WebFlux, if request processing involves long-running CPU intensive computation, this means that the CPU can't process new requests until it's finished with the current one - unless you explicitly offload it to a worker thread (and wrap it in a Mono). This will however effectively nullify the benefits of the reactive model if those CPU-bound computations are what the application will be busy with most of the time, because the CPU will now yet again have to do thread context switching to alternately assign CPU time to the request processing and the newly spawned worker thread. Or worse yet, it will have to juggle multiple parallel such worker threads as they get spawned through new requests.
You can expect performance gains from WebFlux if your application needs to process a very large number of requests per second, but where individual requests need very few CPU cycles to process and I/O is non-blocking. Your use case seems to be the opposite, so the Reactive model might not actually do anything for you compared to the simpler Servlet model.
If, on the other hand, your use case is such that the CPU-bound work can be parallelized, you will need multiple (or at least hyperthreading-enabled) CPU cores to benefit from that. Reactive can't help you there.
(*) Yes, that's an oversimplification, I'm aware of thread pools, I'm just trying to get the point across.

Is it a bad practice to use a BlockingObservable in this context?

I have a use case where I'm calling four separate downstream endpoints and they can all be called in parallel. After every call is completed, I return a container object from the lambda function, its only purpose being to contain the raw responses from the downstream calls on it. From there, the container object will be transformed into the required model for the consumer.
Here's the structure of the code, roughly speaking:
Observable.zip(o1, o2, o3, o4,
(resp1, resp2, resp3, resp4)
-> new RawResponseContainer(resp1, resp2, resp3, resp4)
).toBlocking().first();
Is there a better way to do this? I 100% need every observable to complete; otherwise, the transformation of the consumer model will be incomplete. While I suppose I could transform each individual response from each observable "on the fly", rather than waiting to transform every response at once, I still need every call to finish before the transformation's done.
I've read it's a bad practice to ever use toBlocking() when using rx (aside from for 'legacy' apps), so any help's appreciated.

This is not a response, a comment:
You are asking, essentially a sequential vs. parallel processing question. What you are doing is sequential processing (by blocking), what is recommended is parallel. Though which is better over the other is complete on the context. In your case, you need all the responses, even in parallel model all has to complete successfully. If even one fails, the entire processing is for naught. In parallel, every processing will occur if one fails all 3 would go to waste. In sequential, it would generate error in the middle. If you can live with the latency sequential processing brings, stay with it. Sequential processing are (in general) are less complicated implementations.

Is it good idea to use Thread.sleep in AWS lambda java

I am using AWS Lambda with Java programming language, due to some requirement I have to give sleep in my lambda function for 2-3 or in some cases upto 12 seconds, is it good idea to put Thread.sleep() in lambda function or it has any technical consequences.

There are few cases in which doing Thread.sleep is justified.
Polling every few seconds and checking if certain status, which is not in control of your code has changed. E.g. think of checking if remote process somewhere has finished.
You want to mock certain piece of code, so that it "takes" more time than it actually does.
Throttling down piece of code that does multiple operations per second. E.g. requesting multiple resources from a remote server, but throttling down your requests so that you don't overload it.
I'm sure there are quite a few more justifiable reasons. Don't be afraid to sleep your code. Make sure you're sleeping for a justifiable reason. Also make sure your thread model, in which you indeed need to sleep in your code, does not cause deadlocks.
Note that running in AWS Lambda you should optimize your sleeps to as little amount as possible, as you pay for that sweet, sweet CPU time.

If your Lambda use a high amount of memory would be better (and cheaper) to start two different Lambda than wait for 12 seconds.
If you have a sort of workflow, or you need to wait for a specific condition you could evaluate the introduction of AWS Step Functions or (maybe better) send context to an SQS queue with visibility timeout set to twelve second. In this way, the second lambda will wait, at least, 12 seconds before starts.

Basically you can do whatever you want, in this case you will just pay more :-)
The whole idea of Lambda function is to have a function that takes input and produces output and have a single responsibility, similar to plain old functions.
Let's think why you need to use Thread#sleep:
You perform action #1.
Wait until this action is completed.
Perform action #2.
These are 3 different responsibilities. It's too much for any function, including Lambda :-)
Both actions can be separate Lambda functions. With recent addition of Destination, your Lambda #1 can trigger Lambda #2.
In this case there is no need in polling at all.

What is the advantage of using a library like Guava RateLimiter over simple Thread.sleep?

Assuming all I want to do is call a service at a particular rate, say 1 per second, what advantages does Guava RateLimiter offer over simple Thread.sleep(1000) ?

The point of RateLimiter is you make it part of the service (or wrap the service) being called, so it can protect itself from being called too frequently. Your Thread#sleep alternative would have to be used by the client of the service, so you're comparing two different things.
Here's a good article on what you can do with RateLimiter.

I'm not a RateLimiter expert, but here's a few points I'd like to make anyway:
One of the main benefits of RateLimiter is its ability to control the rate of requests when requests are being made (or coming in) from multiple places, typically on multiple threads.
If you're making all the calls to this service sequentially on a single thread, RateLimiter probably isn't necessary... that's a much simpler case than it's designed for.
Still, (as others have mentioned) it's going to do a better job at accurately limiting you to one request per second than Thread.sleep(1000) will, since that sleep isn't taking into account the time it takes to do any work that's done when making the request.
It's unclear to me whether you're actually trying to rate limit your calls to the service or if what you actually want is a scheduled task that happens once per second. If it's the latter, something like ScheduledExecutorService or ListeningScheduledExecutorService might be preferable.

Use RateLimiter since it fits your use case of limiting access to a service exactly. Except from the JavaDoc:
Rate limiters are often used to restrict the rate at which some physical or logical resource is accessed.
Of course, you could use Thread.sleep instead, but you either have to program the functionality that tracks when your service was last called yourself, or you have to block every execution of your service indiscriminately (possibly blocking unnecessarily on first or last execution).

The difference is latency.
The simplest approach of calling Thread.sleep(1s) every request would then slow every request down by at least 1s..
The Guava rate limiter will check how many requests have been seen before deciding to block. Thus many calls may get through with relatively no latency.
Of course, a smarter implementation can be written than the naive approach that blocks every request using Thread.sleep. However at that point, one would be re-inventing the Guava approach.

Java ExecutorService - sometimes slower than sequential processing?

I'm writing a simple utility which accepts a collection of Callable tasks, and runs them in parallel. The hope is that the total time taken is little over the time taken by the longest task. The utility also adds some error handling logic - if any task fails, and the failure is something that can be treated as "retry-able" (e.g. a timeout, or a user-specified exception), then we run the task directly.
I've implemented this utility around an ExecutorService. There are two parts:
submit() all the Callable tasks to the ExecutorService, storing the Future objects.
in a for-loop, get() the result of each Future. In case of exceptions, do the "retry-able" logic.
I wrote some unit tests to ensure that using this utility is faster than running the tasks in sequence. For each test, I'd generate a certain number of Callable's, each essentially performing a Thread.sleep() for a random amount of time within a bound. I experimented with different timeouts, different number of tasks, etc. and the utility seemed to outperform sequential execution.
But when I added it to the actual system which needs this kind of utility, I saw results that were very variable - sometimes the parallel execution was faster, sometimes it was slower, and sometimes it was faster, but still took a lot more time than the longest individual task.
Am I just doing it all wrong? I know ExecutorService has invokeAll() but that swallows the underlying exceptions. I also tried using a CompletionService to fetch task results in the order in which they completed, but it exhibited more or less the same behavior. I'm reading up now on latches and barriers - is this the right direction for solving this problem?

I wrote some unit tests to ensure that using this utility is faster than running the tasks in sequence. For each test, I'd generate a certain number of Callable's, each essentially performing a Thread.sleep() for a random amount of time within a bound
Yeah this is certainly not a fair test since it is using neither CPU nor IO. I certainly hope that parallel sleeps would run faster than serial. :-)
But when I added it to the actual system which needs this kind of utility, I saw results that were very variable
Right. Whether or not a threaded application runs faster than a serial one depends a lot on a number of factors. In particular, IO bound applications will not improve in performance since they are bound by the IO channel and really cannot do concurrent operations because of this. The more processing that is needed by the application, the larger the win is to convert it to be multi-threaded.
Am I just doing it all wrong?
Hard to know without more details. You might consider playing around with the number of threads that are running concurrently. If you have a ton of jobs to process you should not be using a Executos.newCachedThreadPool() and should optimized the newFixedSizeThreadPool(...) depending on the number of CPUs your architecture has.
You also may want to see if you can isolate the IO operations in a few threads and the processing to other threads. Like one input thread reading from a file and one output thread (or a couple) writing to the database or something. So multiple sized pools may do better for different types of tasks instead of using a single thread-pool.
tried using a CompletionService to fetch task results in the order in which they completed
If you are retrying operations, using a CompletionService is exactly the way to go. As jobs finish and throw exceptions (or return failure), they can be restarted and put back into the thread-pool immediately. I don't see any reason why your performance problems would be because of this.

Multi-threaded programming doesn't come for free. It has an overhead. The over head can easily exceed and performance gain and usually makes your code more complex.
Additional threads give access to more cpu power (assuming you have spare cpus) but in general they won't make you HDD spin faster , give you more network bandwidth or speed up something which is not cpu bound.
Multiple threads can help give you a greater share of an external resource.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.