I'm just kind of getting started with Spring and I want to build a RESTful API for a project I'm working on. My backend has a lot of HTTP calls to third-party services, I've decided that it would be prudent to implement a Reactive design and have the architecture be non-blocking. I'm using Retrofit and it has a callback-based async API which will work fine for me. Here's the problem; I've already implemented my database and models using Hibernate and JPA, it's really mature and can handle everything from migrations to validations and everything in between, I like using JPA, but it's blocking and so doesn't fit neatly in my architecture design. Is it okay to have the reactive stack everywhere else and perhaps migrate the persistence stuff to a reactive model later when the tooling and frameworks are almost at par with JPA? The main issue is creating the database schema at start-up, if there's a solution to that, I'd be glad to work with it.
blocking in any fully reactive webflux application is bad from a performance perspective.
Webflux starts with a very few number of threads, which means that if you block there is a high risk that your application will (under load) be susceptible to thread starvation, this because no new threads spawn in reactive applications, and your applications small thread pool will be blocked waiting for responses from your database.
There is a workaround which is that you place all potential blocking calls instead on its own scheduler, using the subscribeOn operator. This is documented in the reactor documentation
Just remember by wrapping blocking calls, you will not get the benefits from reactive programming, like a smaller memory footprint, and potentially higher throughput. But at least you will not suffer from thread starvation.
Those calls will instead behave like "regular calls" to a "regular spring boot web server" since those calls will get assigned a single thread that will follow the call throughout execution.
I currently have a Vert.x codebase. I was using Golang, but Golang kinda sucks and doesn't have a good ORM. But apparently, Vert.x doesn't have a good ORM either, primarily because Vert.x is non-blocking and most ORMs for Java were based on blocking APIs.
Anyhow, I have a specific question - I read that Hibernate/JPA could be used with Vert.x - what we could do is put the Hibernate calls in a different Verticle and then it would be non-blocking.
Is that a good idea? Can someone show an example of doing that with 2 different Vert.x verticles?
If it's not a good idea, what might be a good ORM to use? Naked SQL calls sounds cool at first, but for migrations and stuff, might get kinda crazy.
#tsegismont, as he usually does, already provided a good solution in the comments. I would like just to clarify the following sentence:
I read that Hibernate/JPA could be used with Vert.x - what we could do is put the Hibernate calls in a different Verticle and then it would be non-blocking
There is a true and a false part there:
Hibernate/JPA could be used with Vert.x
True. By putting blocking code in a worker verticle you don't block Vert.x event loop, and that allows frameworks based on JDBC to work with Vert.x
put the Hibernate calls in a different Verticle and then it would be non-blocking
False. You don't make Hibernate non-blocking. JDBC is blocking in it's nature, and there's not much that can be done to solve that (although R2DBC is a nice initiative). You'll use the same thread pool you were using before, with the same limitations.
I keep studying and trying Reactive Style of coding using Reactor and RxJava. I do understand that reactive coding makes better utilization of CPU compared to single threaded execution.
Is there any concrete comparison between reactive programming vs imperative programming in web based applications?
How much is the performance gain, throughput I achieve by using reactive programming over non-reactive programming?
Also what are the advantages and disadvantages of Reactive Programming?
Is there any statistical benchmark?
Well, Reactive Programming means you are doing all your IO bound tasks such as network calls asynchronously. For an instance say your application calls an external REST API or a database, you can do that invocation asynchronously. If you do so your current thread does not block. You can serve lots of requests by merely spawning one or few threads. If you follow blocking approach you need to have one thread to handle each and every request. You may refer my multi part blog post part one, part two and part three for further details.
Other than that you may use callbacks to do the same. You can do asynchronous invocation using callbacks. But if you do so sometimes you may ended up with callback hell. Having one callback inside another leads to very complex codes which are very hard to maintain. On the other hand RxJava lends you write asynchronous code which is much more simple, composable and readable. Also RxJava provides you a lots of powerful operators such as Map, Zip etc which makes your code much more simple while boosting the performance due to parallel executions of different tasks which are not dependent on each other.
RxJava is not another Observer implementation with set of operators rather it gives you good error handling and retry mechanisms which are really handy.
But I have not conducted any bench marking of RxJava with imperative programming approach to commend you statistically. But I am pretty much sure RxJava should yield good performance over blocking mechanisms.
Update
Since I gathered more experience over time, I thought of adding more points to my answer.
Based on the article, ReactiveX is a library for composing asynchronous and event-based programs by using observable sequences. I reckon you to go through this introductory article in the first place.
These are some properties of reactive systems: Event Driven, Scalable, Resilient, Responsive
When it comes to RxJava it offers two main facilities to a programmer. First it offers a nice composable API using a rich set of operators such as zip, concat, map etc. This yields more simple and readable code. When it comes to code, readability and simplicity are the uttermost important properties. Second, it provides excellent abstractions, that enable concurrency to become declarative.
A popular misconception is that Rx is multithreaded by default. In fact, Rx is single-threaded by default. If you want to do things asynchronously, then you have to tell it explicitly using subscribeOn and observeOn operators by passing relevant schedulers. RxJava gives you thread pools to do asynchronous tasks. There are many schedulers such as IO, Computation and so forth. IO scheduler as the name suggests is best suited for IO intensive tasks such as network calls etc. on the contrary, Computation scheduler is good for more CPU intensive computation tasks. You can also hook up your own Executor services with RxJava too. The built in schedulers mainly helps you to get rid of maintaining your own Executor services, making your code more simple.
Finally a word on subscribeOn and observeOn
In the Rx world, there are generally two things you want to control the concurrency model for:
The invocation of the subscription
The observing of notifications
SubscribeOn: specify the Scheduler on which an Observable will operate.
ObserveOn: specify the Scheduler on which an observer will observe this Observable
Disadvantages
More memory intensive to store streams of data most of the times (since it is based on streams over time).
Might feel unconventional to learn at start(needs everything to be a stream).
Most complexities have to be dealt with at the time of declaration of new services.
Lack of good and simple resources to learn.
Often confused to be equivalent to Functional Reactive Programming.
Apart of what is already mentioned in other responses regarding no blocking features, another great feature about reactive programing is the important use of backpressure. Normally it is used in situations where your publisher emits more information than your consumer can process.
So having this mechanism you can control the flow of traffic between both and avoid nasty out of memory problems.
You can see some practical examples of reactive programming here: https://github.com/politrons/reactive
And about back pressure here: https://github.com/politrons/Akka/blob/master/src/main/scala/stream/BackPressure.scala
By the way, the only disadvantage about reactive programming, is the learning curve because you're changing the programming paradigm. But nowadays all important companies respect and follow the reactive manifesto.
Reactive Programming is a style of micro-architecture involving intelligent routing and consumption of events.
Reactive is that you can do more with less, specifically you can process higher loads with fewer threads.
Reactive types are not intended to allow you to process your requests or data faster.Their strength lies in their capacity to serve more request concurrently, and to handle operations with latency, such as requesting data from a remote server, more efficiently.
They allow you to provide a better quality of service and a predictable capacity planning by dealing natively with time and latency without consuming more resources.
From
https://blog.redelastic.com/what-is-reactive-programming-bc9fa7f4a7fc
https://spring.io/blog/2016/06/07/notes-on-reactive-programming-part-i-the-reactive-landscape
https://spring.io/blog/2016/07/28/reactive-programming-with-spring-5-0-m1
Advantages
Cleaner code, more concise
Easier to read (once you get the hang of
it)
Easier to scale (pipe any operation)
Better error handling
Event-driven inspired -> plays well with streams (Kafka,
RabbitMQ,etc)
Backpressure (client can control flow)
Disadvantages
Can become more memory intensive in some cases
Somewhat steep learning curve
Reactive programming is a kind of imperative programming.
Reactive programming is a kind of parallel programming.
You can achieve performance gain over single threaded execution only if you manage to create parallel branches. Will they executed by multiple threads, or by reactive constructs (which in fact are asynchronous procedures), does not matter.
The single advantage of reactive programming over multithreaded programming is lower memory consumption (each thread requires 0.5...1 megabyte). The disadvantage is less easy programming.
UPDATE (Aug 2020). Parallel programming can be of 2 flavours: mulithreaded programming, where main activity is thread, and asynchronous programming, where main kind of activity is asynchronous procedure (including actors, which are repeatable asynchronous procedures). In mulithreaded programming, various means of communication are used: unbounded queues, bounded (blocking) queues, binary and counting semaphores, countdownLatches and so on. Moreover. there is always possiblity to create your own mean of communication. In asynchronous programming, until recently, only 2 kinds of communicators were used: future for non-repeatable asynchronous procedures, and unbounded queue for actors. Unbounded queue causes problems when producer works faster than consumer. To cope with this problem, new communication protocol was invented: reactive stream, which is combination of unbounded queue and counting (asynchronous) semaphore to make the queue bounded. This is direct analogue to the blocking queue in multithreaded programming. And programming with reactive streams was proudly called Reactive Programming (imagine, if in multithreded programming, programming with blocking queues was called Blocking Programming). But again, no means to create own communication tools were provided to asynchronous programmer. And the asynchronous semaphore cannot be used in its own, only as part of reactive stream. That said, the theory of asynchronous programming, including theory of reactive programming, lags far behind the theory of multithreded programming.
A fancy addition to reactive streams is mapping/filtering functions allowing to write linear piplines like
publisher
.map(()->mappingFunction)
.filter(()->filterFunction)
.flatmap(...)
etc.
But this is not an exclusive feature of reactive programming. And this allows to create only linear piplines, while in multithreaded programming it is easy to create computational graphs of arbitrary topology.
Today I found that, for concurrency in java we have good framework like Akka and I also found that, there is a reactive programming frameworks like RxJava for performing multithreading in application. But I'm still confused! Why are both better than Java Concurrency framework?
Nowadays reactive programing is mature topic, and most languages have support for Functional Reactive Programing like Netflix provide APIs regarding Reactive programming for more than one language. Rxjava is one of the api that is used for java, scala etc. According to RxJava, they internally use actors for maintaining multithreading and Akka also uses Actors for multithreading programming.
So, what is the difference between Akka and Reactive Programming approach and why they are good from Java Concurrency ?
According to Mathias Doenitz at this point in time RxJava doesn't have back pressure unlike Akkas Reactive Streams implementation. But RxJava seems to be working on adding back pressure.
Both frameworks will be able to interact through the reactive streaming spi.
So you will be able to do very very similar things. According to Mathias the difference will be that the Akka implementation is based internally on actors, not on multi-threading. And as a result will be more performant.
My source for this information is a talk that Mathias gave last week at the Dutch Scala user group.
edit: I stand corrected wrt back pressure support in RxJava. If you follow Eriks link you can read what back pressure means.
Akka Streams being based on actors provides interop between actors and streams, e.g.:
reading from actor and passing it to streams and
reading from streams and passing it to actors
Does it make sense to use Apache Camel for Asynchronous requests? Or should I use simple MoM using a JMS server.
There are no Enterprise Integration Patterns that I'll require.
Any help would be useful.
Even if you are not using any Enterprise Integration Patterns (yet) - Camel is great at integrating messaging into your application while hiding all of the middleware APIs while letting you easily switch between all the various different middleware technologies usually by just changing one or two strings.
e.g. see these links for more detail
POJO producing
POJO consuming
Spring remoting
There is a POJO Messaging Example that walks you through using Camel purely as a way to integrate messaging into your POJOs
It does make sense to use Camel for async calls, especially because it can handle callbacks cleanly. For example:
template.asyncCallback("activemq:queue:longTasks", request, callback);
Where a callback is a org.apache.camel.spi.Synchronization object that handles both responses as well as failure conditions.
To add to the other answers:
Camel also provides many very useful utils for common programming tasks.
I haven't used it in 8 months, but I'll be using it on my next project...sprint 0 next week.
Perhaps, I too will have more questions on the latest of camel soon.
Happy coding.