What's the benefit of using reactive programming over ExecutorService? - java

If both are asynchronous in nature, then what's the use of using Reactive programming over ExecutorService in Java? In what ways reactive programming can be found effective as compared to ExecutorService?

Asynchronous programming usually includes some kinds of task interaction. Different kinds of asynchronous programming provide different kinds of task interaction.
ExecutorService executes submitted tasks as soon as there exists available processor, that is, it provides only simplest form of asynchronous programming, without task interaction at all.
Reactive programming provides channels to exchange messages with backpressure, which is quite advanced kind of task interaction. But under the hood, it still uses an ExecutorService.

Related

Kotlin Concurrency againts Goroutine, Task, CompletableFuture

People always show the same example when I read articles about concurrency in kotlin coroutine or golang goroutine.
Create 100_000 Threads in Java or C#, ooopps Stackoverflow.
Yes. but anyone Who uses directly Thread classes in Java or C#?
In java and C#, There are thread pools for CompletableFuture and Task.
When We try to create 100_000 Task or CompletableFuture, We can do that easily with ExecuterService/ForkJoinPool or dotnet DefaultThread Pool. They will reuse the threads. If there is no available thread. Tasks will wait in the queue.
My Questions;
yes structured concurrency is good for cancellations. But Kotlin uses the Thread Pool like CompletableFuture. But unlike Java Callbacks, It provides natural code syntax. The only Difference is Syntax for Kotlin coroutine between c# Task or Java CompletableFuture?
Kotlin runs on JVM. as far as I know, JVM doesn't support green Threads. But people talk like kotlin uses Green Threads. How is that possible with JVM? And Why Coroutines are called Lightweight Threads. Then We can say CompletableFuture and Task are Lightweight Thread too. Right?
Yes, golang has a scheduler. goroutines are user-level threads. When we create a goroutine it goes to localrunqueue. And a dedicated OS thread gets goroutines one by one from that queue and executes. There are no context switch operations. All of them run on the same OS Thread until blocking. Goroutines are cheap and We can say that YES goroutines are Lightweight Threads.
Maybe I'm completely wrong about coroutines. please correct me.
Making things simple:
Thread - easy to use, because it is sequential. Reading from the network and writing to a file is as simple as: writeToDisk(readFromNetwork()). On the other hand, it is expensive to run each task in a separate thread.
Executors/CompletableFuture - more performant, it makes better use of both CPU and memory. On the other hand, it requires using callbacks and the code quickly becomes hard to read and maintain.
Coroutines - both sequential and performant.
I ignore other features of coroutines like structured concurrency, because this was not your main concern.

Read a java "Queue" from multiple theads with all objects in each thread

Is there a standard or third party implementation of the java.util.Queue which allows me to read a queue concurrently in multiple threads but delivers all objects to all threads.
Objective is to do multiple parallel processing of messages being regularly added to in a queue. Some processes are fast while others slow. We need all messages to be processed by all threads.
This task is solved in the libraries which support Reactive Streams initiative, e.g Project Reactor or RxJava. As far as I know, all they support only asynchronous producers and consumers, not threads. But I belive it is easy to create adaptors from asynchronous (non-bloking) to synchronous (blocking) way of communication. And probably the asynchronous solution is better in your case.

Parallel job execution with split-and-aggregate in Java

We are working on rewrite of an existing application, and need support for high number of read/write to database. For this, we are proceeding with sharding on MySQL. Since we are allowing bulk APIs for read/write, this would mean parallel execution of queries on different shards.
Can you suggest frameworks which would support the same in Java, mainly focussing on split-and-aggregate jobs. Basically I will define two interfaces ReadTask and WriteTask, and implementation of these tasks will be jobs and they would be submitted as a list for parallel execution.
I might not have termed this question in the right way, but I hope you got the context from the description. Let me know if there is any info needed for answer.
BLUF: This sounds like a common processing pattern in Akka.
This sounds like a Scatter-Gather patterned API.
If you have 1 job, you should first answer if that job will touch only one shard or more? If it will touch many shards you may choose to reject it (allowing only single-shard actions) or you may choose to break it up (scatter) it across other workers.
Akka gives you APIs, especially the Streaming API, that talk about this style of work. Akka is best expressed in Scala, but it has a Java API that gives you all the functionality of the Scala one. That you are talking about "mapping" and "reducing" (or "folding") data, these are functional operations and Scala gives you the functional idioms.
If you scatter it across other workers, you'll need to communicate the manifest of jobs to the gather side of the system.
Hope that's helpful.
You can use the ThreadPoolExecutor & Executors(factory) in Java to create Thread pools to which you can submit your read & write tasks. It allows for Runnable & Callable based on your situation.

Reactive Programming Advantages/Disadvantages

I keep studying and trying Reactive Style of coding using Reactor and RxJava. I do understand that reactive coding makes better utilization of CPU compared to single threaded execution.
Is there any concrete comparison between reactive programming vs imperative programming in web based applications?
How much is the performance gain, throughput I achieve by using reactive programming over non-reactive programming?
Also what are the advantages and disadvantages of Reactive Programming?
Is there any statistical benchmark?
Well, Reactive Programming means you are doing all your IO bound tasks such as network calls asynchronously. For an instance say your application calls an external REST API or a database, you can do that invocation asynchronously. If you do so your current thread does not block. You can serve lots of requests by merely spawning one or few threads. If you follow blocking approach you need to have one thread to handle each and every request. You may refer my multi part blog post part one, part two and part three for further details.
Other than that you may use callbacks to do the same. You can do asynchronous invocation using callbacks. But if you do so sometimes you may ended up with callback hell. Having one callback inside another leads to very complex codes which are very hard to maintain. On the other hand RxJava lends you write asynchronous code which is much more simple, composable and readable. Also RxJava provides you a lots of powerful operators such as Map, Zip etc which makes your code much more simple while boosting the performance due to parallel executions of different tasks which are not dependent on each other.
RxJava is not another Observer implementation with set of operators rather it gives you good error handling and retry mechanisms which are really handy.
But I have not conducted any bench marking of RxJava with imperative programming approach to commend you statistically. But I am pretty much sure RxJava should yield good performance over blocking mechanisms.
Update
Since I gathered more experience over time, I thought of adding more points to my answer.
Based on the article, ReactiveX is a library for composing asynchronous and event-based programs by using observable sequences. I reckon you to go through this introductory article in the first place.
These are some properties of reactive systems: Event Driven, Scalable, Resilient, Responsive
When it comes to RxJava it offers two main facilities to a programmer. First it offers a nice composable API using a rich set of operators such as zip, concat, map etc. This yields more simple and readable code. When it comes to code, readability and simplicity are the uttermost important properties. Second, it provides excellent abstractions, that enable concurrency to become declarative.
A popular misconception is that Rx is multithreaded by default. In fact, Rx is single-threaded by default. If you want to do things asynchronously, then you have to tell it explicitly using subscribeOn and observeOn operators by passing relevant schedulers. RxJava gives you thread pools to do asynchronous tasks. There are many schedulers such as IO, Computation and so forth. IO scheduler as the name suggests is best suited for IO intensive tasks such as network calls etc. on the contrary, Computation scheduler is good for more CPU intensive computation tasks. You can also hook up your own Executor services with RxJava too. The built in schedulers mainly helps you to get rid of maintaining your own Executor services, making your code more simple.
Finally a word on subscribeOn and observeOn
In the Rx world, there are generally two things you want to control the concurrency model for:
The invocation of the subscription
The observing of notifications
SubscribeOn: specify the Scheduler on which an Observable will operate.
ObserveOn: specify the Scheduler on which an observer will observe this Observable
Disadvantages
More memory intensive to store streams of data most of the times (since it is based on streams over time).
Might feel unconventional to learn at start(needs everything to be a stream).
Most complexities have to be dealt with at the time of declaration of new services.
Lack of good and simple resources to learn.
Often confused to be equivalent to Functional Reactive Programming.
Apart of what is already mentioned in other responses regarding no blocking features, another great feature about reactive programing is the important use of backpressure. Normally it is used in situations where your publisher emits more information than your consumer can process.
So having this mechanism you can control the flow of traffic between both and avoid nasty out of memory problems.
You can see some practical examples of reactive programming here: https://github.com/politrons/reactive
And about back pressure here: https://github.com/politrons/Akka/blob/master/src/main/scala/stream/BackPressure.scala
By the way, the only disadvantage about reactive programming, is the learning curve because you're changing the programming paradigm. But nowadays all important companies respect and follow the reactive manifesto.
Reactive Programming is a style of micro-architecture involving intelligent routing and consumption of events.
Reactive is that you can do more with less, specifically you can process higher loads with fewer threads.
Reactive types are not intended to allow you to process your requests or data faster.Their strength lies in their capacity to serve more request concurrently, and to handle operations with latency, such as requesting data from a remote server, more efficiently.
They allow you to provide a better quality of service and a predictable capacity planning by dealing natively with time and latency without consuming more resources.
From
https://blog.redelastic.com/what-is-reactive-programming-bc9fa7f4a7fc
https://spring.io/blog/2016/06/07/notes-on-reactive-programming-part-i-the-reactive-landscape
https://spring.io/blog/2016/07/28/reactive-programming-with-spring-5-0-m1
Advantages
Cleaner code, more concise
Easier to read (once you get the hang of
it)
Easier to scale (pipe any operation)
Better error handling
Event-driven inspired -> plays well with streams (Kafka,
RabbitMQ,etc)
Backpressure (client can control flow)
Disadvantages
Can become more memory intensive in some cases
Somewhat steep learning curve
Reactive programming is a kind of imperative programming.
Reactive programming is a kind of parallel programming.
You can achieve performance gain over single threaded execution only if you manage to create parallel branches. Will they executed by multiple threads, or by reactive constructs (which in fact are asynchronous procedures), does not matter.
The single advantage of reactive programming over multithreaded programming is lower memory consumption (each thread requires 0.5...1 megabyte). The disadvantage is less easy programming.
UPDATE (Aug 2020). Parallel programming can be of 2 flavours: mulithreaded programming, where main activity is thread, and asynchronous programming, where main kind of activity is asynchronous procedure (including actors, which are repeatable asynchronous procedures). In mulithreaded programming, various means of communication are used: unbounded queues, bounded (blocking) queues, binary and counting semaphores, countdownLatches and so on. Moreover. there is always possiblity to create your own mean of communication. In asynchronous programming, until recently, only 2 kinds of communicators were used: future for non-repeatable asynchronous procedures, and unbounded queue for actors. Unbounded queue causes problems when producer works faster than consumer. To cope with this problem, new communication protocol was invented: reactive stream, which is combination of unbounded queue and counting (asynchronous) semaphore to make the queue bounded. This is direct analogue to the blocking queue in multithreaded programming. And programming with reactive streams was proudly called Reactive Programming (imagine, if in multithreded programming, programming with blocking queues was called Blocking Programming). But again, no means to create own communication tools were provided to asynchronous programmer. And the asynchronous semaphore cannot be used in its own, only as part of reactive stream. That said, the theory of asynchronous programming, including theory of reactive programming, lags far behind the theory of multithreded programming.
A fancy addition to reactive streams is mapping/filtering functions allowing to write linear piplines like
publisher
.map(()->mappingFunction)
.filter(()->filterFunction)
.flatmap(...)
etc.
But this is not an exclusive feature of reactive programming. And this allows to create only linear piplines, while in multithreaded programming it is easy to create computational graphs of arbitrary topology.
 

java fork-join executor usage for db access

The ForkJoinTask
explicitly calls out "Subdividable tasks should also not perform blocking I/O". It's primary aim is "computational tasks calculating pure functions or operating on purely isolated objects". My question is :-
Why design the ForkJoinTask to restrict blocking IO tasks?
What are the gotchas if i do implement a blocking IO task?
How come both spring and play frameworks, are full of examples using fork-join executors for DB calls?
In my scenario, a single request does two types of works, one of which is encryption, which pushes CPU core to 100% for 200 ms and second, few database calls. Any kind of static partitioning such as 6 threads for encryption and 2 threads for blocking IO, will not provide optimal usage of the CPU. Hence, having a fork-join executor, with certain level of over provisioning in number of threads over total CPU count, coupled with work stealing, would ensure better usage of CPU resources.
Is my above assumption and understanding around forkjoin executor correct and if not, please point me towards the gap.
Why design the ForkJoinTask to restrict blocking IO tasks?
underlying the fork join pool is shared amount of threads, if there's some IO work blocking on those threads, then less threads for CPU intensive work. other none blocking work will starve.
What are the gotchas if i do implement a blocking IO task?
typically, FJPool allocated thread about the number of processors. so if you do have to use IO blocking on threads, make sure you allocate enough threads for your other tasks.
you can also iso late your IO work on dedicated threads that is not shared with FJ pool. but you call blocking IO, your thread blocks and get scheduled for other task until unblocked
How come both spring and play frameworks, are full of examples using fork-join executors for DB calls?
play is no different. they use dedicated pools for IO task, so other task won't suffer.
The Framework does not restrict any type of processing. It is not recommended to do blocking, etc. I wrote a critique about this framework years ago, here is the point on the recommendations. This was for the Java7 version but it is still applicable for Java8.
Blocking is not fatal, sprint and play block and they work just fine. You need to be careful when using Java8 since there is a default common-fork/join pool and tying up threads there may have consequences for other users. You could always define your own f/j pool with the additional overhead, but at least you wouldn’t interfere with others using the common pool.
Your scenario doesn’t look bad. You’re not waiting for replies from the internet. Give it a try. If you run into difficulty with stalling threads, look into the ForkJoinPool.ManagedBlocker interface. Using that interface informs the f/j pool that you are doing blocking calls and the framework will create compensation threads.

Categories