Reactive Programming Advantages/Disadvantages - java

I keep studying and trying Reactive Style of coding using Reactor and RxJava. I do understand that reactive coding makes better utilization of CPU compared to single threaded execution.
Is there any concrete comparison between reactive programming vs imperative programming in web based applications?
How much is the performance gain, throughput I achieve by using reactive programming over non-reactive programming?
Also what are the advantages and disadvantages of Reactive Programming?
Is there any statistical benchmark?

Well, Reactive Programming means you are doing all your IO bound tasks such as network calls asynchronously. For an instance say your application calls an external REST API or a database, you can do that invocation asynchronously. If you do so your current thread does not block. You can serve lots of requests by merely spawning one or few threads. If you follow blocking approach you need to have one thread to handle each and every request. You may refer my multi part blog post part one, part two and part three for further details.
Other than that you may use callbacks to do the same. You can do asynchronous invocation using callbacks. But if you do so sometimes you may ended up with callback hell. Having one callback inside another leads to very complex codes which are very hard to maintain. On the other hand RxJava lends you write asynchronous code which is much more simple, composable and readable. Also RxJava provides you a lots of powerful operators such as Map, Zip etc which makes your code much more simple while boosting the performance due to parallel executions of different tasks which are not dependent on each other.
RxJava is not another Observer implementation with set of operators rather it gives you good error handling and retry mechanisms which are really handy.
But I have not conducted any bench marking of RxJava with imperative programming approach to commend you statistically. But I am pretty much sure RxJava should yield good performance over blocking mechanisms.
Update
Since I gathered more experience over time, I thought of adding more points to my answer.
Based on the article, ReactiveX is a library for composing asynchronous and event-based programs by using observable sequences. I reckon you to go through this introductory article in the first place.
These are some properties of reactive systems: Event Driven, Scalable, Resilient, Responsive
When it comes to RxJava it offers two main facilities to a programmer. First it offers a nice composable API using a rich set of operators such as zip, concat, map etc. This yields more simple and readable code. When it comes to code, readability and simplicity are the uttermost important properties. Second, it provides excellent abstractions, that enable concurrency to become declarative.
A popular misconception is that Rx is multithreaded by default. In fact, Rx is single-threaded by default. If you want to do things asynchronously, then you have to tell it explicitly using subscribeOn and observeOn operators by passing relevant schedulers. RxJava gives you thread pools to do asynchronous tasks. There are many schedulers such as IO, Computation and so forth. IO scheduler as the name suggests is best suited for IO intensive tasks such as network calls etc. on the contrary, Computation scheduler is good for more CPU intensive computation tasks. You can also hook up your own Executor services with RxJava too. The built in schedulers mainly helps you to get rid of maintaining your own Executor services, making your code more simple.
Finally a word on subscribeOn and observeOn
In the Rx world, there are generally two things you want to control the concurrency model for:
The invocation of the subscription
The observing of notifications
SubscribeOn: specify the Scheduler on which an Observable will operate.
ObserveOn: specify the Scheduler on which an observer will observe this Observable

Disadvantages
More memory intensive to store streams of data most of the times (since it is based on streams over time).
Might feel unconventional to learn at start(needs everything to be a stream).
Most complexities have to be dealt with at the time of declaration of new services.
Lack of good and simple resources to learn.
Often confused to be equivalent to Functional Reactive Programming.

Apart of what is already mentioned in other responses regarding no blocking features, another great feature about reactive programing is the important use of backpressure. Normally it is used in situations where your publisher emits more information than your consumer can process.
So having this mechanism you can control the flow of traffic between both and avoid nasty out of memory problems.
You can see some practical examples of reactive programming here: https://github.com/politrons/reactive
And about back pressure here: https://github.com/politrons/Akka/blob/master/src/main/scala/stream/BackPressure.scala
By the way, the only disadvantage about reactive programming, is the learning curve because you're changing the programming paradigm. But nowadays all important companies respect and follow the reactive manifesto.

Reactive Programming is a style of micro-architecture involving intelligent routing and consumption of events.
Reactive is that you can do more with less, specifically you can process higher loads with fewer threads.
Reactive types are not intended to allow you to process your requests or data faster.Their strength lies in their capacity to serve more request concurrently, and to handle operations with latency, such as requesting data from a remote server, more efficiently.
They allow you to provide a better quality of service and a predictable capacity planning by dealing natively with time and latency without consuming more resources.
From
https://blog.redelastic.com/what-is-reactive-programming-bc9fa7f4a7fc
https://spring.io/blog/2016/06/07/notes-on-reactive-programming-part-i-the-reactive-landscape
https://spring.io/blog/2016/07/28/reactive-programming-with-spring-5-0-m1

Advantages
Cleaner code, more concise
Easier to read (once you get the hang of
it)
Easier to scale (pipe any operation)
Better error handling
Event-driven inspired -> plays well with streams (Kafka,
RabbitMQ,etc)
Backpressure (client can control flow)
Disadvantages
Can become more memory intensive in some cases
Somewhat steep learning curve

Reactive programming is a kind of imperative programming.
Reactive programming is a kind of parallel programming.
You can achieve performance gain over single threaded execution only if you manage to create parallel branches. Will they executed by multiple threads, or by reactive constructs (which in fact are asynchronous procedures), does not matter.
The single advantage of reactive programming over multithreaded programming is lower memory consumption (each thread requires 0.5...1 megabyte). The disadvantage is less easy programming.
UPDATE (Aug 2020). Parallel programming can be of 2 flavours: mulithreaded programming, where main activity is thread, and asynchronous programming, where main kind of activity is asynchronous procedure (including actors, which are repeatable asynchronous procedures). In mulithreaded programming, various means of communication are used: unbounded queues, bounded (blocking) queues, binary and counting semaphores, countdownLatches and so on. Moreover. there is always possiblity to create your own mean of communication. In asynchronous programming, until recently, only 2 kinds of communicators were used: future for non-repeatable asynchronous procedures, and unbounded queue for actors. Unbounded queue causes problems when producer works faster than consumer. To cope with this problem, new communication protocol was invented: reactive stream, which is combination of unbounded queue and counting (asynchronous) semaphore to make the queue bounded. This is direct analogue to the blocking queue in multithreaded programming. And programming with reactive streams was proudly called Reactive Programming (imagine, if in multithreded programming, programming with blocking queues was called Blocking Programming). But again, no means to create own communication tools were provided to asynchronous programmer. And the asynchronous semaphore cannot be used in its own, only as part of reactive stream. That said, the theory of asynchronous programming, including theory of reactive programming, lags far behind the theory of multithreded programming.
A fancy addition to reactive streams is mapping/filtering functions allowing to write linear piplines like
publisher
.map(()->mappingFunction)
.filter(()->filterFunction)
.flatmap(...)
etc.
But this is not an exclusive feature of reactive programming. And this allows to create only linear piplines, while in multithreaded programming it is easy to create computational graphs of arbitrary topology.
 

Related

Netty and Project Loom

I may be wrong, but as far as I understand, the whole Reactive/Event Loop thing, and Netty in particular, was invented as an answer to the C10K+ problem. It has obvious drawbacks, as all your code now becomes Async, with ugly callbacks, meaningless stack traces, and therefore hard to maintain and to reason about.
Go's language with goroutines was a solution, now they can write Sync code and also handle C10K+. So now Java comes up with Loom, which essentially copies the Go's solution, soon we will have Fibers and Continuations and will be able to write Sync code again.
So the questions are:
When the Loom is released in production, doesn't it make Netty kinda obsolete?
If we have Fibers and Continuations in Java, can we write nice Sync code and be ok with C10K+ without Netty?
Are there any advantages, for performance or solving C10K+, in writing Async code and using Netty, after production release of Loom?
I understand that Netty is more than just Reactive/Event Loop framework, it also has all the codecs for various protocols, which implementations will be useful somehow anyway, even afterwards.
I'm focusing on the reactive parts of Netty because those you seem to mostly want to address answering on a general level:
Currently reactive programming paradigms are often used to solve performance problems, not because they fit the problem. Those should be covered completely via project Loom.
However, some problems may remain where the reactive programming approach makes sense and is more straight forward to read than imperative code.
Reactive frameworks are typically stream oriented and are well suited to combine elements and operations on different entity/data streams. They also provide straight forward local event bus solutions with their provider/subscriber model. In such cases the reactive model might still be the best choice, performant and more readable than an imperative approach. But indeed, project loom should make all the "misuse" due to lack of better support in the native language structures obsolete.

What's the benefit of using reactive programming over ExecutorService?

If both are asynchronous in nature, then what's the use of using Reactive programming over ExecutorService in Java? In what ways reactive programming can be found effective as compared to ExecutorService?
Asynchronous programming usually includes some kinds of task interaction. Different kinds of asynchronous programming provide different kinds of task interaction.
ExecutorService executes submitted tasks as soon as there exists available processor, that is, it provides only simplest form of asynchronous programming, without task interaction at all.
Reactive programming provides channels to exchange messages with backpressure, which is quite advanced kind of task interaction. But under the hood, it still uses an ExecutorService.

Achieving Concurrency and Parallelism in Java 7

As part of a study I am doing, I am exploring the supposed simplicity of using languages like Scala & Clojure to achieve concurrency on the JVM.
By simplicity, I am hoping to prove that these languages provide easier concurrency constructs than what Java 7 provides.
Therefore, I am hoping to find some good references that explain the complexities of Java's concurrency model.
Outside of pointing me in the direction of Google (which I have already searched with limited success), I would appreciate if those in-the-know could provide me with some good references to get me started off in this area.
Thanks
Java does not support lambda expressions. Creating an inline callback (eg, for the completion of an asynchronous call) requires 5 lines of boilerplate for an anonymous type.
This strongly discourages people from using callbacks. This is probably why Java 7 still does not have an interface for a callback that takes a value (as opposed to Runnable and Callbable), whereas C# has had one since 2005.
Therefore, the JDK does not have any real support for asynchronous operations.
The key to an asynchronous operation is the ability to kick off a long-running request, and have it run a callback when it finishes, without consuming a thread for the duration of the request. In Java, you can only do this by making a separate thread call get() on a Future<V>. This limits the concurrency of an application using the standard API to the number of threads you can sanely support.
To solve this problem, Google's Guava framework for better Java code introduces a ListenableFuture<V> interface which does have completion callbacks.
Languages like Scala fix this problem by supporting lambda expressions (which compile to anonymous classes) and adding their own Promise / Future types.
While higher level languages are easier to use multiple cores, what is often forgotten is why you want to use multiple cores which is to make the program faster e.g. increase its throughput.
When you consider options which increase concurrency, you need to test whether these options actually improve performance in some way. (Because very often they don't)
e.g. STM (Software Transactional Memory) makes it easier to write multi-threaded applications without having to worry about concurrency issues. The problem is that for trivial examples, it would be faster to not use STM and only use one thread.
Using multiple threads adds complexity and makes your application more fragile, so there has to be a good reason to do it otherwise you should stick to the simplest solution possible.
For more discussion
http://vanillajava.blogspot.co.uk/2011/11/why-concurency-examples-are-confusing.html

Java Non-Blocking and Asynchronous IO with NIO & NIO.2 (JSR203) - Reactor/Proactor Implementations

So here I am reading one of my favorite software pattern books (Pattern-Oriented Software Architecture - Patterns for Concurrent and Networked Objects), specifically the sections on Proactor/Reactor asynchronous IO patterns. I can see how by using selectable channels I can implement a Reactor style asynchronous IO mechanism quite easy (and have done so). But, I cannot see how I would implement a proper Proactor mechanism with non-blocking writes. That is taking advantage of OS managed non-blocking write functions.
Functionality supported by OS specific calls like GetQueuedCompletionStatus under win32.
I did see that Java 7 brings some updates to NIO with asynchronous completion handlers (which seems to be in the right direction). That being said... Given the lack of unified cross-platform support for OS managed async operations (specifically async write) I am assuming that this is a quassy-implementation that is not utilizing native OS support.
So my questions are, is proactor based IO handling possible in Java in such a way that it is advantageous to use for specific scenarios; and, if Java NIO does support proactor based IO handling (either in Java 6 or Java 7) is OS managed asynchronous IO support (i.e. completion callbacks from the OS) being utilized? Furthermore, if the implementation is purely in-VM are the performance benefits so little that using proactive event handling offers nothing more than a different (possibly simpler) way of constructing concurrent network handling software.
For anyone interested in proactive event handling here is a good article that outlines pros / cons and a comparison to both traditional thread-per-connection and reactive IO models.
There are lots of factors involved in this one. I will try to summarize my findings as best as possible (aware of the fact that there is contention regarding the usefulness of reactor and proactor IO handling implementations).
Is proactor based IO handling possible
in Java in such a way that it is
advantageous to use for specific
scenarios.
Java 1.4 introduced non-blocking IO which is NOT the same as asynchronous IO. Java SE 7 introduces asynchronous IO with JSR203 making "true" proactor style IO handling implementations possible.
See AsyncrhonousSocketChannel, AsynchronousServerSocketChannel
and, if Java NIO does support proactor
based IO handling (either in Java 6 or
Java 7) is OS managed asynchronous IO
support (i.e. completion callbacks
from the OS) being utilized?
Reading through the JSR 203 specs, completion handlers using new asynchronous channels are definitely supported and it is reported that native OS features are being utilized but I have not ascertained to what extent yet. I may follow up on this after an analysis of the Java 7 source (unless someone beats me to it).
Furthermore, if the implementation is
purely in-VM are the performance
benefits so little that using
proactive event handling offers
nothing more than a different
(possibly simpler) way of constructing
concurrent network handling software.
I have not been able to find any performance comparisons regarding new Asynchronous IO features in Java 7. I'm sure they will become available in the near future.
As always, when presented with more than one way to tackle a problem the questions of which approach is better is almost always answered with "depends". Proactive event handling (using asynchronous completion handlers) is included with Java 7 and cannot simply exist without purpose. For certain applications, it will make sense to use such IO handling. Historically a common example given where proactor has good applicability is in a HTTP server where many short requests are issued frequently. For a deeper explanation give this a read (provided only to highlight the advantages of proactor so try to overlook the fact that example code is C++).
IMO it seems obvious that in many circumstances reactor/proactor complicate what would otherwise be a very simple design using a more traditional approach and in other more complex systems they offer a high degree of simplification and flexibility.
.
.
.
On a side note I highly recommend reading through the following presentation about NIO which offers performance comparison between NIO and the "traditional" approach. Though I would also advise caution regarding the results presented as the NIO implementation in the benchmark was based on the pre Java 1.4 NBIO NIO library and not the NIO implementation shipped in 1.4.
I would check you really need to worry about blocking writes.
A read blocks where there is no data to read. This can be most of the time. However, a write blocks when the buffers are full, this happens very rarely and often indiciates a slow connection or a failed consumer.
If you want non-blocking IO, do it for the reads, and therefor for the writes as well.
Note: Using blocking IO with NIO is usually simpler and can out perform non-blocking NIO unless you have 1000s of connections, you are likely to find the complexity added is not worth it. (And is possibly not the best option)
one of my favorite software pattern
books (Pattern-Oriented Software
Architecture - Patterns for Concurrent
and Networked Objects)
With respect that book is very out of date and of dubious relevance at any date. It came out of the design pattern frenzy of the late 1990s when there was a concerted attempt to reduce the whole of computer science to design patterns.
My present view is that NIO is already a framework and a design pattern.
NIO already provides an implementation of the reactive pattern (selectors), and NIO2 adds an implementation of the proactive pattern (completion handlers).
Don't reinvent it, just use it, because you cannot beat its performance - which is what anyone trying to avoid blocking i/o is after after all - with a pure Java solution, as you don't get access to the non-blocking / asynchronous features of the underlying OS. But NIO and NIO2 make use of those, which makes them fast.

Confused, are languages like python, ruby single threaded? unlike say java? (for web apps)

I was reading how Clojure is 'cool' because of its syntax + it runs on the JVM so it is multithreaded etc. etc.
Are languages like ruby and python single threaded in nature then? (when running as a web app).
What are the underlying differences between python/ruby and say java running on tomcat?
Doesn't the web server have a pool of threads to work with in all cases?
Both Python and Ruby have full support for multi-threading. There are some implementations (e.g. CPython, MRI, YARV) which cannot actually run threads in parallel, but that's a limitation of those specific implementations, not the language. This is similar to Java, where there are also some implementations which cannot run threads in parallel, but that doesn't mean that Java is single-threaded.
Note that in both cases there are lots of implementations which can run threads in parallel: PyPy, IronPython, Jython, IronRuby and JRuby are only few of the examples.
The main difference between Clojure on the one side and Python, Ruby, Java, C#, C++, C, PHP and pretty much every other mainstream and not-so-mainstream language on the other side is that Clojure has a sane concurrency model. All the other languages use threads, which we have known to be a bad concurrency model for at least 40 years. Clojure OTOH has a sane update model which allows it to not only present one but actually multiple sane concurrency models to the programmer: atomic updates, software transactional memory, asynchronous agents, concurrency-aware thread-local global variables, futures, promises, dataflow concurrency and in the future possibly even more.
A confused question with a lot of confused answers...
First, threading and concurrent execution are different things. Python supports threads just fine; it doesn't support concurrent execution in any real-world implementation. (In all serious implementations, only one VM thread can execute at a time; the many attempts to decouple VM threads have all failed.)
Second, this is irrelevant for web apps. You don't need Python backends to execute concurrently in the same process. You spawn separate processes for each backend, which can then each handle requests in parallel because they're not tied together at all.
Using threads for web backends is a bad idea. Why introduce the perils of threading--locking, race conditions, deadlocks--to something inherently embarrassingly parallel? It's much safer to tuck each backend away in its own isolated process, avoiding the potential for all of these problems.
(There are advantages to sharing memory space--it saves memory, by sharing static code--but that can be solved without threads.)
CPython has a Global Interpreter Lock which can reduce the performance of multi-threaded code in Python. The net effect, in some cases, is that threads can't actually run simultaneously because of locking contention. Not all Python implementations use a GIL so this may not apply to JPython, IronPython or other implementations.
The language itself does support threading and other asynchronous operations. The python libraries can also support threading internally without exposing it directly to the Python interpreter.
If you've heard anything negative about Python and threading (or that it doesn't support it), it is probably someone encountering a situation where the GIL is causing a bottleneck..
Certainly the webserver will have a pool of threads. That's only outside the control of your program. Those threads are used to handle HTTP requests. Each HTTP request is handled in a separate thread and the thread is released back to pool when the associated HTTP response is finished. If the webserver doesn't have such a pool, it would have been extremely slow in serving.
Whether a programming language is singlethreaded or multithreaded dependens on the possibility to programmatically spawn new threads using the language in question. If that isn't possible, then the language is singlethreaded, for example PHP. As far as I can see, both Ruby and Python supports multithreading.
The short answer is yes, they are single threaded.
The long answer is it depends.
JRuby is multithreaded and can be run in tomcat like other java code. MRI (default ruby) and Python both have a GIL (Global Interpreter Lock) and are thus single threaded.
The way it works for web servers is further complicated by the number of available server configurations. For most ruby applications there are (at least) two levels of servers, a proxy/static file server like nginx and then the ruby app server.
Nginx does not use threads like apache or tomcat, it uses non-blocking events (and I think forked worker processes). This allows it to deal with higher levels of concurrency than would be allowed with the overhead and scheduling inefficiencies of native threads.
The various ruby apps servers also work in different ways to get high throughput and concurrency without threads. Thin uses libev and the asynchronous evented model like Nginx. Mongrel uses a round-robin pool of worker processes. Unicorn uses native Unix IPC (select on a socket) to load balance to a pool of forked processes through one master proxy socket.
Threads are only one way to address concurrency. Multiple processes and evented models are a different approach that ties in well with the Unix base. This is fundamentally different from the way Java treats the world.
Python
Let me try to put it more simply than the more detailed answers.
The heart of the answer here doesn't really have to do with Python being single-threaded versus multi-threaded. It has a more to do with threading versus multiprocessing.
Saying Python is "single-threaded" doesn't really capture reality, because you can certainly have more than one thread running in a Python process. Just use the threading library, and create more than one thread. There, now you have just proven that Python isn't single-threaded.
But using multiple threads in Python does NOT mean you're using multiple CPU processors concurrently. In fact, the Global Interpreter Lock prevents this. So this is where questions arise.
Basically, threading in Python cannot be used for parallel CPU computation. But you CAN do parallel CPU computation with Python by using multiprocessing instead of multi-threading.
I found this article very helpful when researching this: https://timber.io/blog/multiprocessing-vs-multithreading-in-python-what-you-need-to-know/ . It includes real-world examples of when you'd want to use multiprocessing versus multi-threading.
Most languages don't define single or multithreading. Usually, that is left up to the libraries to implement.
That being said, some languages are better at it than others. CPython, for instance, has issues with interpreter locking during multithreading, Jython (python running on the JVM) does not.
Some of the real power of Clojure (IMO) is that it runs on the JVM. You get multithreading and tons of libraries for free.
A few interpreted programming
languages such as CPython and Ruby
support threading, but have a
limitation that is known as a Global
Interpreter Lock (GIL). The GIL is a
mutual exclusion lock held by the
interpreter that prevents the
interpreter from concurrently
interpreting the applications code on
two or more threads at the same time,
which effectively limits the
concurrency on multiple core systems.
from wikipedia Thread
keeping this very short..
Python supports Multi Threading.
Python does NOT support parallel execution of its Threads.
Exception:
Above statement may vary with implementations of Python not using GIL (Global Interpreter Locking).
If a particular implementation is not using GIL, then, that would be Multi Threaded as well as support Parallel Execution
Ruby
The Ruby Interpreter is single threaded, which is to say that several of its methods are not thread safe.
In the Rails world, this single-thread has mostly been pushed to the server. So, you'll see nginx running with a pool of mongrel servers, each of which has an interpreter in memory, processes 1 request at a time, and in its own thread.
Passenger, running "ruby enterprise" brings the concept of garbage collection and some thread safety into Rails, and it's nice.
Still work to be done in Rails on this area, but it's getting there slowly -- but in general, the idea is to have multiple services and servers.
How to untangle the knots in al those threads...
Clojure did not invent threading, however it has particularly strong support for it with Software Transactional Memory, Atoms, Agents, parallel map operations, ...
All other have accumulated threading support. Ruby is a special case as it has green threads in some implementations which are a kind of software emulated threads and do not use all the cores. 1.9 will put this to rest.
Regarding web servers, no they do not always work multithreaded, apache has traditionally ran as a flock of daemons which are a pool of separate single threaded processes. Now currently there are more options to run apache servers.
To summarize all modern languages support threading in one form or another.
The newer languages like scala and clojure are adding specific support to improve working with multiple threads without explicit locking as this has traditionally be the great pitfall of multithreading.
Reading these answers here... A lot of them try to sound smarter than they really are imho (im mostly talking about Ruby related stuff as thats the one i'm most familiar with).
In fact, JRuby is currently the only Ruby implementation that supports true concurrency. On JVM Ruby threads are mapped to OS native threads, without GIL interfering. So its totally correct to say that Ruby is not multithreaded.
In 1.8.x Ruby is actually run inside one OS thread, and while you do have the fake feeling of concurrency with green threads, then in reality GIL will pretty much prevent you from having true concurrency.
In Ruby 1.9 this changed a bit, as now a Ruby process can have many OS threads attached to it (plus the green threads), but again GIL will totally destroy the point and become the bottleneck.
In practice, from a regular webapp standpoint, it should not matter much if its single or multithreaded. The problem mostly arises on the server side anyhow and it mostly is a matter of scaling technique difference.
Yes Ruby and Python can handle multi-threading, but for many cases (web) is better to rely on the threads generated by the http requests from the client to the server. Even if you generate many threads on a same application to low the runtime cost or to handle many task at time, in a web application case that's usually too much time, no one will wait happily more than some fractions of a second for the response of your application in a single page, it's more wise to use AJAX (Asynchronous JavaScript And XML) techniques: make sure the design of your web shows up rapidly, and make an asynchronous insertion of those hard-coding things later.
That does not mean that multi-threading is useless for web! It's highly recommended to low the charge of your server if you want to run recursive-complicated-hardcore-applications (not for a website, I mean), but what that thing return must end in files or in databases, so then could be softly served by a http response.

Categories