Today I found that, for concurrency in java we have good framework like Akka and I also found that, there is a reactive programming frameworks like RxJava for performing multithreading in application. But I'm still confused! Why are both better than Java Concurrency framework?
Nowadays reactive programing is mature topic, and most languages have support for Functional Reactive Programing like Netflix provide APIs regarding Reactive programming for more than one language. Rxjava is one of the api that is used for java, scala etc. According to RxJava, they internally use actors for maintaining multithreading and Akka also uses Actors for multithreading programming.
So, what is the difference between Akka and Reactive Programming approach and why they are good from Java Concurrency ?
According to Mathias Doenitz at this point in time RxJava doesn't have back pressure unlike Akkas Reactive Streams implementation. But RxJava seems to be working on adding back pressure.
Both frameworks will be able to interact through the reactive streaming spi.
So you will be able to do very very similar things. According to Mathias the difference will be that the Akka implementation is based internally on actors, not on multi-threading. And as a result will be more performant.
My source for this information is a talk that Mathias gave last week at the Dutch Scala user group.
edit: I stand corrected wrt back pressure support in RxJava. If you follow Eriks link you can read what back pressure means.
Akka Streams being based on actors provides interop between actors and streams, e.g.:
reading from actor and passing it to streams and
reading from streams and passing it to actors
Related
I may be wrong, but as far as I understand, the whole Reactive/Event Loop thing, and Netty in particular, was invented as an answer to the C10K+ problem. It has obvious drawbacks, as all your code now becomes Async, with ugly callbacks, meaningless stack traces, and therefore hard to maintain and to reason about.
Go's language with goroutines was a solution, now they can write Sync code and also handle C10K+. So now Java comes up with Loom, which essentially copies the Go's solution, soon we will have Fibers and Continuations and will be able to write Sync code again.
So the questions are:
When the Loom is released in production, doesn't it make Netty kinda obsolete?
If we have Fibers and Continuations in Java, can we write nice Sync code and be ok with C10K+ without Netty?
Are there any advantages, for performance or solving C10K+, in writing Async code and using Netty, after production release of Loom?
I understand that Netty is more than just Reactive/Event Loop framework, it also has all the codecs for various protocols, which implementations will be useful somehow anyway, even afterwards.
I'm focusing on the reactive parts of Netty because those you seem to mostly want to address answering on a general level:
Currently reactive programming paradigms are often used to solve performance problems, not because they fit the problem. Those should be covered completely via project Loom.
However, some problems may remain where the reactive programming approach makes sense and is more straight forward to read than imperative code.
Reactive frameworks are typically stream oriented and are well suited to combine elements and operations on different entity/data streams. They also provide straight forward local event bus solutions with their provider/subscriber model. In such cases the reactive model might still be the best choice, performant and more readable than an imperative approach. But indeed, project loom should make all the "misuse" due to lack of better support in the native language structures obsolete.
It seems on every iteration of Java for the last few major releases, there are consistently new ways to manage concurrent tasks.
In Java 9, we have the Flow API which resembles the Flowable API of RxJava but with Java 9 has a much simpler set of classes and interfaces.
Java 9
Has a Flow.Publisher, Flow.Subscriber, Flow.Processor, Flow.Subscription, and SubmissionPublisher, and that's about it.
RxJava
Has whole packages of Flow API-like classes, i.e. io.reactivex.flowables, io.reactivex.subscribers, io.reactivex.processors, io.reactivex.observers, and io.reactivex.observables which seem to do something similar.
What are the main differences between these two libraries? Why would someone use the Java 9 Flow library over the much more diverse RxJava library or vice versa?
What are the main differences between these two libraries?
The Java 9 Flow API is not a standalone library but a component of the Java Standard Edition library and consists of 4 interfaces adopted from the Reactive Streams specification established in early 2015. In theory, it's inclusion can enable in-JDK specific usages, such as the incubating HttpClient, maybe the planned Async Database Connection in parts, and of course SubmissionPublisher.
RxJava is Java library that uses the ReactiveX style API design to provide a rich set of operators over reactive (push) dataflows. Version 2, through Flowable and various XxxProcessors, implements the Reactive Streams API which allows instances of Flowable to be consumed by other compatible libraries and in turn one can wrap any Publisher into a Flowable to consume those and compose the rich set of operators with them.
So the Reactive Streams API is the minimal interface specification and RxJava 2 is one implementation of it, plus RxJava declares a large set of additional methods to form a rich and fluent API of its own.
RxJava 1 inspired, among other sources, the Reactive Streams specification but couldn't capitalize on it (had to remain compatible). RxJava 2, being a full rewrite and a separate main version, could embrace and use the Reactive Streams specification (and even expand upon it internally, thanks to the Rsc project) and has been released almost a year before Java 9. In addition, it was decided both v1 and v2 keeps supporting Java 6 and thus a lot of Android runtimes. Therefore it couldn't capitalize directly on the Flow API provided now by Java 9 directly but only through a bridge. Such bridge is required by and/or provided in other Reactive Streams-based libraries too.
RxJava 3 may target the Java 9 Flow API but this hasn't been decided yet and depending on what features the subsequent Java versions bring (i.e., value types), we may not have v3 within a year or so.
Till then, there is a prototype library called Reactive4JavaFlow which does implement the Flow API and offers a ReactiveX style rich fluent API over it.
Why would someone use the Java 9 Flow library over the much more diverse RxJava library or vice versa?
The Flow API is an interoperation specification and not an end-user API. Normally, you wouldn't use it directly but to pass flows around to various implementations of it. When JEP 266 was discussed, the authors didn't find any existing library's API good enough to have something default with the Flow API (unlike the rich java.util.Stream). Therefore, it was decided that users will have to rely on 3rd party implementations for now.
You have to wait for existing reactive libraries to support the Flow API natively, through their own bridge implementation or new libraries to be implemented.
Providing a rich set of operators over the Flow API is only reason a library would implement it. Datasource vendors (i.e., reactive database drivers, network libraries) can start implementing their own data accessors via the Flow API and rely on the rich libraries to wrap those and provide the transformation and coordination for them without forcing everybody to implement all sorts of these operators.
Consequently, a better question is, should you start using the Flow API-based interoperation now or stick to Reactive Streams?
If you need working and reliable solutions relatively soon, I suggest you stick with the Reactive Streams ecosystem for now. If you have plenty of time or you want to explore things, you could start using the Flow API.
At the beginning, there was Rx, version one. It was a language agnostic specification of reactive APIs that has implementations for Java, JavaScript, .NET. Then they improved it and we saw Rx 2. It has implementations for different languages as well. At the time of Rx 2 Spring team was working on Reactor — their own set of reactive APIs.
And then they all thought: why not make a joint effort and create one API to rule them all. That was how Reactive Commons was set up. A joint research effort for building highly optimized reactive streams compliant operators. Current implementors include RxJava2 and Reactor.
At the same time JDK developers realized that reactive stuff is great and worth including in Java. As it is usual in Java world the de facto standard become de jure. Remeber Hibernate and JPA, Joda Time and Java 8 Date/Time API? So what JDK develpers did is extracting the very core of reactive APIs, the most basic part, and making it a standard. That is how j.u.c.Flow was born.
Technically, j.u.c.Flow is much more simpler, it consists only of four simple interfaces, while other libraries provide dozens of classes and hundreds of operators.
I hope, this answers the question "what is the difference between them".
Why would someone choose j.u.c.Flow over Rx? Well, because now it is a standard!
Currently JDK ships with only one implementation of j.u.c.Flow: HTTP/2 API. It is actually an incubating API. But in future we might expect support of it from Reactor, RxJava 2 as well as from other libraries, like reactive DB drivers or even FS IO.
"What are the main differences between these two libraries?"
As you noted yourself, the Java 9 library is much more basic and basically serves as a general API for reactive streams instead of a full-fledged solution.
"Why would someone use the Java 9 Flow library over the much more diverse RxJava library or vice versa?"
Well, for the same reason people use basic library constructs over libraries - one less dependency to manage. Also, due to the fact that the Flow API in Java 9 is more general, it is less constrained by the specific implementation.
What are the main differences between these two libraries?
This mostly holds true as an informative comment(but too long to fit in), the JEP 266: More Concurrency Updates responsible for the introduction of the Flow API in Java9 states this in its description(emphasis mine) -
Interfaces supporting the Reactive Streams publish-subscribe
framework, nested within the new class Flow.
Publishers produce items
consumed by one or more Subscribers, each managed by a Subscription.
Communication relies on a simple form of flow control (method
Subscription.request, for communicating back pressure) that can be
used to avoid resource management problems that may otherwise occur in
"push" based systems. A utility class SubmissionPublisher is provided
that developers can use to create custom components.
These (very
small) interfaces correspond to those defined with broad participation
(from the Reactive Streams initiative) and support interoperability
across a number of async systems running on JVMs.
Nesting the interfaces within a class is a conservative policy allowing
their use across various short-term and long-term possibilities. There
are no plans to provide network- or I/O-based java.util.concurrent
components for distributed messaging, but it is possible that future JDK
releases will include such APIs in other packages.
Why would someone use the Java 9 Flow library over the much more diverse RxJava library or vice versa?
Looking at a wider prospect this is completely opinion based on factors like the type of application a client is developing and its usages of the framework.
I keep studying and trying Reactive Style of coding using Reactor and RxJava. I do understand that reactive coding makes better utilization of CPU compared to single threaded execution.
Is there any concrete comparison between reactive programming vs imperative programming in web based applications?
How much is the performance gain, throughput I achieve by using reactive programming over non-reactive programming?
Also what are the advantages and disadvantages of Reactive Programming?
Is there any statistical benchmark?
Well, Reactive Programming means you are doing all your IO bound tasks such as network calls asynchronously. For an instance say your application calls an external REST API or a database, you can do that invocation asynchronously. If you do so your current thread does not block. You can serve lots of requests by merely spawning one or few threads. If you follow blocking approach you need to have one thread to handle each and every request. You may refer my multi part blog post part one, part two and part three for further details.
Other than that you may use callbacks to do the same. You can do asynchronous invocation using callbacks. But if you do so sometimes you may ended up with callback hell. Having one callback inside another leads to very complex codes which are very hard to maintain. On the other hand RxJava lends you write asynchronous code which is much more simple, composable and readable. Also RxJava provides you a lots of powerful operators such as Map, Zip etc which makes your code much more simple while boosting the performance due to parallel executions of different tasks which are not dependent on each other.
RxJava is not another Observer implementation with set of operators rather it gives you good error handling and retry mechanisms which are really handy.
But I have not conducted any bench marking of RxJava with imperative programming approach to commend you statistically. But I am pretty much sure RxJava should yield good performance over blocking mechanisms.
Update
Since I gathered more experience over time, I thought of adding more points to my answer.
Based on the article, ReactiveX is a library for composing asynchronous and event-based programs by using observable sequences. I reckon you to go through this introductory article in the first place.
These are some properties of reactive systems: Event Driven, Scalable, Resilient, Responsive
When it comes to RxJava it offers two main facilities to a programmer. First it offers a nice composable API using a rich set of operators such as zip, concat, map etc. This yields more simple and readable code. When it comes to code, readability and simplicity are the uttermost important properties. Second, it provides excellent abstractions, that enable concurrency to become declarative.
A popular misconception is that Rx is multithreaded by default. In fact, Rx is single-threaded by default. If you want to do things asynchronously, then you have to tell it explicitly using subscribeOn and observeOn operators by passing relevant schedulers. RxJava gives you thread pools to do asynchronous tasks. There are many schedulers such as IO, Computation and so forth. IO scheduler as the name suggests is best suited for IO intensive tasks such as network calls etc. on the contrary, Computation scheduler is good for more CPU intensive computation tasks. You can also hook up your own Executor services with RxJava too. The built in schedulers mainly helps you to get rid of maintaining your own Executor services, making your code more simple.
Finally a word on subscribeOn and observeOn
In the Rx world, there are generally two things you want to control the concurrency model for:
The invocation of the subscription
The observing of notifications
SubscribeOn: specify the Scheduler on which an Observable will operate.
ObserveOn: specify the Scheduler on which an observer will observe this Observable
Disadvantages
More memory intensive to store streams of data most of the times (since it is based on streams over time).
Might feel unconventional to learn at start(needs everything to be a stream).
Most complexities have to be dealt with at the time of declaration of new services.
Lack of good and simple resources to learn.
Often confused to be equivalent to Functional Reactive Programming.
Apart of what is already mentioned in other responses regarding no blocking features, another great feature about reactive programing is the important use of backpressure. Normally it is used in situations where your publisher emits more information than your consumer can process.
So having this mechanism you can control the flow of traffic between both and avoid nasty out of memory problems.
You can see some practical examples of reactive programming here: https://github.com/politrons/reactive
And about back pressure here: https://github.com/politrons/Akka/blob/master/src/main/scala/stream/BackPressure.scala
By the way, the only disadvantage about reactive programming, is the learning curve because you're changing the programming paradigm. But nowadays all important companies respect and follow the reactive manifesto.
Reactive Programming is a style of micro-architecture involving intelligent routing and consumption of events.
Reactive is that you can do more with less, specifically you can process higher loads with fewer threads.
Reactive types are not intended to allow you to process your requests or data faster.Their strength lies in their capacity to serve more request concurrently, and to handle operations with latency, such as requesting data from a remote server, more efficiently.
They allow you to provide a better quality of service and a predictable capacity planning by dealing natively with time and latency without consuming more resources.
From
https://blog.redelastic.com/what-is-reactive-programming-bc9fa7f4a7fc
https://spring.io/blog/2016/06/07/notes-on-reactive-programming-part-i-the-reactive-landscape
https://spring.io/blog/2016/07/28/reactive-programming-with-spring-5-0-m1
Advantages
Cleaner code, more concise
Easier to read (once you get the hang of
it)
Easier to scale (pipe any operation)
Better error handling
Event-driven inspired -> plays well with streams (Kafka,
RabbitMQ,etc)
Backpressure (client can control flow)
Disadvantages
Can become more memory intensive in some cases
Somewhat steep learning curve
Reactive programming is a kind of imperative programming.
Reactive programming is a kind of parallel programming.
You can achieve performance gain over single threaded execution only if you manage to create parallel branches. Will they executed by multiple threads, or by reactive constructs (which in fact are asynchronous procedures), does not matter.
The single advantage of reactive programming over multithreaded programming is lower memory consumption (each thread requires 0.5...1 megabyte). The disadvantage is less easy programming.
UPDATE (Aug 2020). Parallel programming can be of 2 flavours: mulithreaded programming, where main activity is thread, and asynchronous programming, where main kind of activity is asynchronous procedure (including actors, which are repeatable asynchronous procedures). In mulithreaded programming, various means of communication are used: unbounded queues, bounded (blocking) queues, binary and counting semaphores, countdownLatches and so on. Moreover. there is always possiblity to create your own mean of communication. In asynchronous programming, until recently, only 2 kinds of communicators were used: future for non-repeatable asynchronous procedures, and unbounded queue for actors. Unbounded queue causes problems when producer works faster than consumer. To cope with this problem, new communication protocol was invented: reactive stream, which is combination of unbounded queue and counting (asynchronous) semaphore to make the queue bounded. This is direct analogue to the blocking queue in multithreaded programming. And programming with reactive streams was proudly called Reactive Programming (imagine, if in multithreded programming, programming with blocking queues was called Blocking Programming). But again, no means to create own communication tools were provided to asynchronous programmer. And the asynchronous semaphore cannot be used in its own, only as part of reactive stream. That said, the theory of asynchronous programming, including theory of reactive programming, lags far behind the theory of multithreded programming.
A fancy addition to reactive streams is mapping/filtering functions allowing to write linear piplines like
publisher
.map(()->mappingFunction)
.filter(()->filterFunction)
.flatmap(...)
etc.
But this is not an exclusive feature of reactive programming. And this allows to create only linear piplines, while in multithreaded programming it is easy to create computational graphs of arbitrary topology.
I am trying to set up a Kafka system. Since most of the existing code in my project is already in PHP, I will most probably be writing the producers in PHP itself. But I am comparatively very less constrained when it comes to choosing a language to write the consumer. Now, that there are so many clients which can be used I am in a fix.
In other to order to choose the right tech here, what are the various factors that should be kept in mind?
Would especially like to apply this knowledge to choose between java client vs node client(multithreaded model vs async model)
Any help will be highly appreciated.
The Java client is the most advance client and officially supported by the Kafka Project -- most other clients are third party projects and many do not implement all available features.
Thus, I would recommend to use Java clients.
Kafka is basically written in pure Java and Kafka’s native API is java, so this is the only language where you’re not using a third-party library.You always have an edge over writing in other languages which have an additional overhead.
Node.js isn’t optimized for high throughput applications such as Kafka. So if you need the high processing rates that come standard on Kafka, or perhaps C++.
Also, I believe Kafka consumer clients written in Java has good community support. So it makes sense to implement it using java as long as you don't have any other dependency stopping you from implementing it from.
Also, check this out for the benchmarking results using various Kafka Clients. The results are contrasting.
Client Type Throughput(No of messages)
Java 40,000 - 50,0000
Go 28,000 - 30,0000
Node 6,000 - 8,0000
Kafka-pixy 700 - 800
Logstash 250
As far as Kafka goes, I'd use any of the languages with an official Confluent supported client: JVM, C/C++, .NET, Python, Go
I'm sure you can get others to work like Node or PHP, and maybe those can use the C library, but I would prefer something with official language support and a broader user to ask questions to.
I know that there is a supported Scala DSL for Camel. Apart from that
Is it realistic to replace Java (the language) completely by Scala for a Camel based project?
Which kind of known problems are known to exist?
Which workarounds exist for those problems (other than using Java)?
I am mainly looking for less boilerplaty code.
Akka offers stable Scala-idiomatic Camel integration.
The akka-camel module allows actors,
untyped actors and typed actors to
receive and send messages over a great
variety of protocols and APIs. This
section gives a brief overview of the
general ideas behind the akka-camel
module, the remaining sections go into
the details. In addition to the native
Scala and Java actor API, actors can
now exchange messages with other
systems over large number of protcols
and APIs such as HTTP, SOAP, TCP, FTP,
SMTP or JMS, to mention a few. At the
moment, approximately 80 protocols and
APIs are supported.
Apart from that, I'm sure this replacement is possible due to a good interop, and there could hardly be any Scala-specific issues that are not peculiar to Java. E.g., Akka Actors used for publishing to/consuming from Camel endpoints are based on java.util.concurrency, and the only problem I can think of is a fixable bug in the library.
In the meantime a relatively simple Scala DSL has been developed for Camel, that should have the functionality of the Java DSL.
To decide if it is realistic for you, consider:
- The quality of the IDE support for the languages
- The Scala language complexity
- The Scala/Java language popularity
- DSL extension possibilities. In Scala, it should be possible (with some Scala magic) to to extend the DSL (add additional DSL elements)
If you decide to try it out, it would be great if you share your experience with the Apache Camel community your impressions on:
code readability, code maintainability, code efficiency, developer satisfaction, code size, the number of "man-days".
Since then (2010-2011), there is now (Sept 2016) a recent initiative named for Akka Streams Integration, codename Alpakka.
We believe that Akka Streams can be the tool for building a modern alternative to Apache Camel. That will not happen by itself overnight and this is a call for arms for the community to join us on this mission. The biggest asset of Camel is its rich set of endpoint components. We would like to see that similar endpoints are developed for Akka Streams.
See "akka/akka-stream-contrib".