How are reactive streams different from non blocking I/O ? what is that java 8 future API cannot do that reactive streams can do?
Same as non-blocking I/O, Reactive Extensions(ReactiveX) offers the non blocking programing style.
Not only that, ReactiveX makes everything as a stream, offers many operations for a stream. This functionality makes asynchronous programing very easy, saves us from callback hell ;)
I recommend you to read this document.
http://reactivex.io/intro.html
And here is good slide for ReactiveX.
https://speakerdeck.com/benjchristensen/applying-reactive-programming-with-rxjava-at-goto-chicago-2015
http://sssslide.com/speakerdeck.com/android10/the-mayans-lost-guide-to-rxjava-on-android
The main reason is that ReactiveX provide some operators to run your pipeline asynchronously like SubscribeOn or ObserverOn. And also provide some other functionality that Java8 or Scala by default in his functional programing does not provide.
Here you can see an example about Asynchronous operators to understand how works https://github.com/politrons/reactive/blob/master/src/test/java/rx/observables/scheduler/ObservableAsynchronous.java
And global examples of RxJava here https://github.com/politrons/reactive
Non-blocking I/O is a lower-level abstraction than reactive streams, who offer you constructs like these (consider serviceX to be a retrofit client):
Observable.zip(
service1.getFoo(1),
service2.doBar(xyz),
service3.makeBaz("meh"),
(a,b,c) -> service4.somethingElse(a+b+c)
)
.onErrorReturn("error");
This, in 7 contacts 3 services in parallel and when all of them have returned contacts a 4th with their results.
Related
For a long time, Spring has been recommending RestTemplate for sync http requests. However, nowadays the documentation says:
NOTE: As of 5.0 this class is in maintenance mode, with only minor requests for changes and bugs to be accepted going forward. Please, consider using the org.springframework.web.reactive.client.WebClient which has a more modern API and supports sync, async, and streaming scenarios.
But I haven't been able to see how one is recommended to use WebClient for sync scenarios. There is this in the documentation:
WebClient can be used in synchronous style by blocking at the end for the result
and I've seen some codebases using .block() all over the place. However, my problem with this is that with some experience in reactive frameworks, I've grown to understand that blocking a reactive call is a code smell and should really be used in testing only. For example this page says
Sometimes you can only migrate part of your code to be reactive, and you need to reuse reactive sequences in more imperative code.
Thus if you need to block until the value from a Mono is available, use Mono#block() method. It will throw an Exception if the onError event is triggered.
Note that you should avoid this by favoring having reactive code end-to-end, as much as possible. You MUST avoid this at all cost in the middle of other reactive code, as this has the potential to lock your whole reactive pipeline.
So is there something I've missed that avoids block()s but allows you to do sync calls, or is using block() everywhere really the way?
Or is the intent of WebClient API to imply that one just shouldn't do blocking anywhere in your codebase anymore? As WebClient seems to be the only alternative for future http calls offered by Spring, is the only viable choice in the future to use non-blocking calls throughout your codebase, and change the rest of the codebase to accommodate that?
There's a related question here but it focuses on the occurring exception only, whereas I would be interested to hear what should be the approach in general.
Firstly accoring to the WebClient Java docs
public interface WebClient
Non-blocking, reactive client to perform HTTP requests, exposing a fluent, reactive API over underlying HTTP client libraries such as
Reactor Netty. Use static factory methods create() or create(String),
or builder() to prepare an instance.
So webClient is not created to be blocking somehow.
However the response that webClient returns can be of type <T> reactor.core.publisher.Flux<T> and other times of type <T> reactor.core.publisher.Mono<T>. Flux and Mono from reactor project are the ones that have blocking methods. ResponseSpec from WebClient.
WebClient was designed to be a reactive client.
As you might have seen from other reactive libraries from other languages example RxJs for javascript the reactive programming is usually based on functional programming.
What happens here with Flux and Mono from reactor project is that they allow you to make block() in order to make synchronous execution without the need of functional programming.
Here is a part of an article that I find much interesting
Extractors: The Subscribers from the Dark Side
There is another way to
subscribe to a sequence, which is to call Mono.block() or
Mono.toFuture() or Flux.toStream() (these are the "extractor"
methods — they get you out of the Reactive types into a less flexible,
blocking abstraction). Flux also has converters collectList() and
collectMap() that convert from Flux to Mono. They don’t actually
subscribe to the sequence, but they do throw away any control you
might have had over the suscription at the level of the individual
items.
Warning A good rule of thumb is "never call an extractor". There are
some exceptions (otherwise the methods would not exist). One notable
exception is in tests because it’s useful to be able to block to allow
results to accumulate. These methods are there as an escape hatch to
bridge from Reactive to blocking; if you need to adapt to a legacy
API, for instance Spring MVC. When you call Mono.block() you throw
away all the benefits of the Reactive Streams
So can you do synchronous programming without using the block() operations ?
Yes you can but then you have to think in terms of functional programming
for your application.
Example
public void doSomething1( ) {
webClientCall_1....subscribe( response1 -> {
...do something else ...
webClientCall_2....subscribe( response2 -> {
...do something else more with response1 and response2 available here...
});
});
}
This is called subscribe callback hell. You can avoid it using .block() methods but again as the provided article mentioned they throw away the reactive nature of that library.
I have a service which uses springs RestTemplate to call out to multiple urls.
To improve performance I'd like to perform these requests in parallel. Two options available to me are:
java 8 parallel streams leveraging the fork-join common pool
completable future using isolated thread pool
Just wondering if it best practice to use parallel streams with blocking I/O calls ?
A ForkJoinPool isn't ideal for doing IO work since you don't gain any of the benefits of its work stealing properties. If you planned to use the commonPool and other parts of your app did as well, you might interfere with them. A dedicated thread pool, an ExecutorService for example, is probably the better solution among those two.
I'd like to suggest something even better. Instead of writing all the async wrapping code yourself, consider using Spring's AsyncRestTemplate. It's included in the Spring Web library and its API is almost identical to RestTemplate.
Spring's central class for asynchronous client-side HTTP access.
Exposes similar methods as RestTemplate, but returns ListenableFuture
wrappers as opposed to concrete results.
[...]
Note: by default AsyncRestTemplate relies on standard JDK facilities
to establish HTTP connections. You can switch to use a different HTTP
library such as Apache HttpComponents, Netty, and OkHttp by using a
constructor accepting an AsyncClientHttpRequestFactory.
ListenableFuture instances can easily be converted to CompletableFuture instances through ListenableFuture::completable().
As noted in the Javadoc, you can control what async mechanism you want to use by specifying a AsyncClientHttpRequestFactory. There are a number of built-in implementations, for each of the libraries listed. Internally, some of these libraries might do what you suggested and run blocking IO on dedicated thread pools. Others, like Netty (if memory serves), use non-blocking IO to run the connections. You might gain some benefit from that.
Then it's up to you how you reduce the results. With CompletableFuture, you have access to the anyOf and allOf helpers and any of the combination instance methods.
For example,
URI exampleURI = URI.create("https://www.stackoverflow.com");
AsyncRestTemplate template = new AsyncRestTemplate/* specific request factory*/();
var future1 = template.exchange(exampleURI, HttpMethod.GET, null, String.class).completable();
var future2 = template.exchange(exampleURI, HttpMethod.GET, null, String.class).completable();
var future3 = template.exchange(exampleURI, HttpMethod.GET, null, String.class).completable();
CompletableFuture.allOf(future1, future2, future3).thenRun(() -> {
// you're done
});
AsyncRestTemplate has since been deprecated in favor of Spring Web Flux' WebClient. This API is considerably different so I won't go into it (except to say that it does let you get back a CompletableFuture as well).
Completable future would be a better way to do this, as it is semantically more related to the task and you might keep the code flow going while the task proceeds.
If you use streams, beside the awkwardness of lambdas with exception handling inside and the fact that it is not so related to the task, semantically as in a pipeline, you will have to wait for all of them to finish, even if they are occuring in parallel. To avoid that you would need futures, but then you will be back to the first solution.
You might consider a mix, using streams to create the futures. But given that it is a blocking IO set of requests, you will probably not have enough requests or time to take advantage of the parallel streams, the library will probably not split the tasks in parallel for you and you will be better of with a loop.
I came across several instances when people were trying to persuade me into using RxJava instead of Android's standard AsyncTask construct.
In my opinion RxJava offers a lot more features but loses in simplicity against AsyncTask.
Are there any use cases that suit one approach better than the other or even more general can RxJava even be considered superior?
The full power of RxJava is visible when you use it on Java 8, preferably with a library like Retrofit. It allows you to trivially chain operations together, with full control of error handling. For example, consider the following code given id: an int that specifies the order and apiClient: a Retrofit client for the order management microservice:
apiClient
.getOrder(id)
.subscribeOn(Schedulers.io())
.flatMapIterable(Order::getLineItems)
.flatMap(lineItem ->
apiClient.getProduct(lineItem.getProductId())
.subscribeOn(Schedulers.io())
.map(product -> product.getCurrentPrice() * lineItem.getCount()),
5)
.reduce((a,b)->a+b)
.retryWhen((e, count) -> count<2 && (e instanceof RetrofitError))
.onErrorReturn(e -> -1)
.subscribe(System.out::println);
This will asynchronously calculate the total price of an order, with the following properties:
at most 5 requests against the API in flight at any one time (and you can tweak the IO scheduler to have a hard cap for all requests, not just for a single observable chain)
up to 2 retries in case of network errors
-1 in case of failure (an antipattern TBH, but that's an other discussion)
Also, IMO the .subscribeOn(Schedulers.io()) after each network call should be implicit - you can do that by modifying how you create the Retrofit client. Not bad for 11+2 lines of code, even if it's more backend-ish than Android-ish.
RxBinding/RxAndroid by Jake Wharton provides some nice threading functionality that you can use to make async calls but RxJava provides waaay more benefits/functionality than just dealing with async threading. That said, There is a pretty steep learning curve (IMO). Also, it should be noted that there is nothing wrong with using AsyncTasks, you can just write more eloquent solutions with Rx (also, IMO).
TLDR you should make an effort to use it. Retrofit and RxJava work together nicely for your AsyncTask replacement purposes.
I am trying to understand when to use Akka Futures and found this article to be a little bit more helpful than the main Akka docs. So it looks like Akka Futures do exactly the same thing as Java 7 Futures. So I ask:
Outside the context of an actor system, what benefits do Akka Futures have over Java Futures? When to use each?
Within the context of an actor system, why ever use an Akka Future? Aren't all actor-to-actor messages asynchronous, concurrent and non-blocking?
Akka Futures implement asynchronous way of communication, while Java7 Futures implement synchronous approach. Yes they do the same thing - communication - but in quite different way.
Producer-Consumer pair can interact in two ways: synchronous and asynchronous. Synchronous way assumes the consumer has its own thread and performs a blocking operation to get next produced message, e.g. BlockingQueue.take(). In asynchronous approach, consumer does not own a thread, it is just an object with at least two methods: to store a message and to process it. Producer calls the store method, just like it calls Queue.put(m) in synchronous approach, but this method also initiates execution of the consumer's processing method on a common thread pool.
UPDT
As for the 2nd question (why ever use an Akka Future):
Future creation looks (and is) simpler than Actor's; code for a chain of Futures is more compact and more demonstrable than that of Actors.
Note however, a Future can pass only a single value (message) while an Actor can handle a sequence of messages. But sequences can be handled with Akka Streams. So the question arise: why ever use Akka Actors? I invite more experienced developers to answer this question. Generally, I think if your task can be solved with Futures, then use Futures, else if with Streams, use Streams, else if with Akka Actors, then use Actors, else look for another framework.
For the first part of your question, I agree with Alexei Kaigorodov's answer.
For the second part of your question:
It is useful to use a Future internally when actor responses need to be combined in a very specific way. For example, let's say that the Master actor needs to perform several blocking database queries and then aggregate their results, and so Master sends each query to a Worker and will then aggregate the responses. If the query results can be aggregated in any order (e.g. Master is just summing row counts or whatever) then it makes sense for Worker to send its results to Master via a callback. However, if the results need to be combined in a very specific order then it is easier for each Worker to immediately return a Future and for Master to then go about manipulating these Futures in the correct order. This could be done via callbacks as well, but then Master would need to figure out which query result is which to put them in the correct order and it will be much more difficult to optimize the code (e.g. if the results of query1 can be immediately aggregated with the results of query2 then by using a Future this logic can go directly into the dispatch code where the identities of all queries is already known, whereas using a callback would require Master to identify the query result and also determine if it can aggregate the query with any other query results that have been returned).
I want to parse multiple files to extract the required data and then write the output into an XML file. I have used Callable Interface to implement this. My colleague asked me to use Java 8 feature which does this job easily. I am really confused which one of them I should use now.
list.parallelStream().forEach(a -> {
System.out.println(a);
});
Using concurrency or a parallel stream only helps if you have independent tasks to work on. A good example of when you wouldn't do this is what you are locking on a shared resources e.g.
// makes no sense to use parallel here.
list.parallelStream().forEach(a -> {
// locks System.out so only one thread at a time can do any work.
System.out.println(a);
});
However, as a general question, I would use parallelStream for processing data, instead of the concurrency libraries directly because;
a functional style of coding discourages shared mutable state. (Actually how are not supposed to have an mutable state in functional programming but Java is not really a functional language)
it's easier to write and understand for processing data.
it's easier to test whether using parallel helps or not. Most likely ti won't and you can just as easily change it back to being serial.
IMHO Given the chances that using parallel coding will really help is low, the best feature of parallelStream is not how simple it is to add, but how simple it is to take out.
The concurrency library is better if you have ad hoc work which is difficult to model as a stream of data. e.g. a worker pool for client requests might be simplier to implement using an ExecutorService.