I am trying to understand when to use Akka Futures and found this article to be a little bit more helpful than the main Akka docs. So it looks like Akka Futures do exactly the same thing as Java 7 Futures. So I ask:
Outside the context of an actor system, what benefits do Akka Futures have over Java Futures? When to use each?
Within the context of an actor system, why ever use an Akka Future? Aren't all actor-to-actor messages asynchronous, concurrent and non-blocking?
Akka Futures implement asynchronous way of communication, while Java7 Futures implement synchronous approach. Yes they do the same thing - communication - but in quite different way.
Producer-Consumer pair can interact in two ways: synchronous and asynchronous. Synchronous way assumes the consumer has its own thread and performs a blocking operation to get next produced message, e.g. BlockingQueue.take(). In asynchronous approach, consumer does not own a thread, it is just an object with at least two methods: to store a message and to process it. Producer calls the store method, just like it calls Queue.put(m) in synchronous approach, but this method also initiates execution of the consumer's processing method on a common thread pool.
UPDT
As for the 2nd question (why ever use an Akka Future):
Future creation looks (and is) simpler than Actor's; code for a chain of Futures is more compact and more demonstrable than that of Actors.
Note however, a Future can pass only a single value (message) while an Actor can handle a sequence of messages. But sequences can be handled with Akka Streams. So the question arise: why ever use Akka Actors? I invite more experienced developers to answer this question. Generally, I think if your task can be solved with Futures, then use Futures, else if with Streams, use Streams, else if with Akka Actors, then use Actors, else look for another framework.
For the first part of your question, I agree with Alexei Kaigorodov's answer.
For the second part of your question:
It is useful to use a Future internally when actor responses need to be combined in a very specific way. For example, let's say that the Master actor needs to perform several blocking database queries and then aggregate their results, and so Master sends each query to a Worker and will then aggregate the responses. If the query results can be aggregated in any order (e.g. Master is just summing row counts or whatever) then it makes sense for Worker to send its results to Master via a callback. However, if the results need to be combined in a very specific order then it is easier for each Worker to immediately return a Future and for Master to then go about manipulating these Futures in the correct order. This could be done via callbacks as well, but then Master would need to figure out which query result is which to put them in the correct order and it will be much more difficult to optimize the code (e.g. if the results of query1 can be immediately aggregated with the results of query2 then by using a Future this logic can go directly into the dispatch code where the identities of all queries is already known, whereas using a callback would require Master to identify the query result and also determine if it can aggregate the query with any other query results that have been returned).
Related
In (Java) reactive programming, what is the difference between a Future<T> and a (Project Reactor) Mono<T>? Both seem to be means for accessing the result of an asynchronous computation at a time in the future when the computation is complete. Why introduce the Mono interface if Future already does the job?
The greatest difference is that a Mono<T> can be fully lazy, whereas when you get hold of a Future<T>, the underlying processing has already started.
With a typical cold Mono, nothing happens until you subscribe() to it, which makes it possible to pass the Mono around in the application and enrich it with operators along the way, before even starting the processing.
It is also far easier to keep things asynchronous using a Mono compared to a Future (where the API tends to drive you to call the blocking get()).
Finally, compared to both Future and CompletableFuture, the composition aspect is improved in Mono with the extensive vocabulary of operators it offers.
Producer and consumer can communicate in 2 ways: synchronous and asynchronous.
In synchronous (pull-based) way, consumer is a thread and some intermediate communicator object is used. Usually it is a blocking queue. In special case, when only single value is passed during the whole producer-consumer communication, a communicator which implements interface Future can be used. This way is called synchronous, because the consumer calls communicating method like Future.get() and that methods waits until the value is available, and then returns that value as a result. That is, requesting the value, and receiving it, are programmed in the same statement, though these actions can be separated in time.
The drawback of synchronous communication is that when the consumer waits for the requested value, it wastes considerable amount of memory for it's thread stack. As a result, we can have only limited number of actions which wait for data. For example, it could be internet connections serving multiple clients. To increase that number, we can represent consumer not as a thread, but as some relatively small object, with methods called by the producer or communicator when datum for consumer is available. This way is called asynchronous. It is split in 2 actions: request to producer to pass data and passing that data to consumer. This is asynchronous (push-based) method.
Now the reply to the question is: Future is able to act as a synchronous communicator only (with get methods), and Mono can be used both as synchronous communicator (with block methods) and as an asynchronous one (with subscribe methods).
Note that java.util.concurrent.CompletableFuture can also act both as synchronous and asynchronous communicator. Why to have similar means to do the same thing? This phenomenon is called not invented here.
I have Flux<URL>. How can I make multiple concurrent void requests for each URL (for example myWebClient.refresh(URL)), then (after all requests done) read data from the database and return Flux<MyAnyEntity> (for example repo.findAll())?
You can achieve that using Flux/Mono operators:
// get the URIs from somewhere
Flux<URI> uris = //...
Flux<MyAnyEntity> entities = uris
// map each URI to a HTTP client call and do nothing with the response
.flatMap(uri -> webClient.get().uri(uri).exchange().then())
// chain that call with a call on your repository
.thenMany(repo.findAll());
Update:
This code is naturally asynchronous, non-blocking so all operations in the flatMap operator will be executed concurrently, according to the demand communicated by the consumer (this is the backpressure we're talking about).
If the Reactive Streams Subscriber request(N) elements, then N requests might be executed concurrently. I don't think this is not something you want to deal with directly, although you can influence things using windowing operators for micro-bacthing operations.
Using .subscribeOn(Schedulers.parallel()) will not improve concurrency in this case - as stated in the reference documentation you should only use that for CPU-bound work.
This is more of a Java concurrency design question. I’m working on an application that need to process many messages for many different clients. If two messages have different client names, then they can be processed in parallel. However, if they have the same client name, then they need to be processed in order serially.
What’s the best way to implement this?
My current implementation is pretty simple: I wrote a wrapper class called OrderedExecutorPool. It has a list of single-threaded executors. In its submit method, it does the following to figure out which executor to submit the task to:
int executorNum = Math.abs(clientName.hashCode()) % numExecutors;
executorList.get(executorNum).submit(task);
This ensures that all messages with same clients go to the same executor while still supporting processing messages for different clients in parallel.
There are a couple of problems with this design:
1.) If most client names have same hash code, then only a few executors are doing work
2.) If one client has MANY messages, only one executor may not keep up
Is there an elegant solution to this problem that can fix the shortcomings above?
Edit
clientName is just a String. I'm just invoking the String.hashCode() method on it.
There is no jdk builtin solution that i know of. i've implemented a custom executor solution to this at my current job using this basic logic.
keep an internal map of clientname to work queue (each client has their own queue)
when work comes in for a client, add it to their queue
if this is the first job on the queue, create a Runnable for this clientname/queue and push it into the "real" executor (standard jdk thread pool)
Runnable impl just consumes tasks from a single client queue until empty and then exits
this simple implementation is the "greedy" approach (a client will keep working until its queue is empty). if you have more clients than underlying threads, you may want a more "fair" approach, where a client executes some number of tasks and they re-queues itself in the underlying executor (thus allowing other clients to get some work done).
I need to use memcached Java API in my Scala/Akka code. This API gives you both synchronous and asynchronous methods. The asynchronous ones return java.util.concurrent.Future. There was a question here about dealing with Java Futures in Scala here How do I wrap a java.util.concurrent.Future in an Akka Future?. However in my case I have two options:
Using synchronous API and wrapping blocking code in future and mark blocking:
Future {
blocking {
cache.get(key) //synchronous blocking call
}
}
Using asynchronous Java API and do polling every n ms on Java Future to check if the future completed (like described in one of the answers above in the linked question above).
Which one is better? I am leaning towards the first option because polling can dramatically impact response times. Shouldn't blocking { } block prevent from blocking the whole pool?
I always go with the first option. But i am doing it in a slightly different way. I don't use the blocking feature. (Actually i have not thought about it yet.) Instead i am providing a custom execution context to the Future that wraps the synchronous blocking call. So it looks basically like this:
val ecForBlockingMemcachedStuff = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(100)) // whatever number you think is appropriate
// i create a separate ec for each blocking client/resource/api i use
Future {
cache.get(key) //synchronous blocking call
}(ecForBlockingMemcachedStuff) // or mark the execution context implicit. I like to mention it explicitly.
So all the blocking calls will use a dedicated execution context (= Threadpool). So it is separated from your main execution context responsible for non blocking stuff.
This approach is also explained in a online training video for Play/Akka provided by Typesafe. There is a video in lesson 4 about how to handle blocking calls. It is explained by Nilanjan Raychaudhuri (hope i spelled it correctly), who is a well known author for Scala books.
Update: I had a discussion with Nilanjan on twitter. He explained what the difference between the approach with blocking and a custom ExecutionContext is. The blocking feature just creates a special ExecutionContext. It provides a naive approach to the question how many threads you will need. It spawns a new thread every time, when all the other existing threads in the pool are busy. So it is actually an uncontrolled ExecutionContext. It could create lots of threads and lead to problems like an out of memory error. So the solution with the custom execution context is actually better, because it makes this problem obvious. Nilanjan also added that you need to consider circuit breaking for the case this pool gets overloaded with requests.
TLDR: Yeah, blocking calls suck. Use a custom/dedicated ExecutionContext for blocking calls. Also consider circuit breaking.
The Akka documentation provides a few suggestions on how to deal with blocking calls:
In some cases it is unavoidable to do blocking operations, i.e. to put
a thread to sleep for an indeterminate time, waiting for an external
event to occur. Examples are legacy RDBMS drivers or messaging APIs,
and the underlying reason is typically that (network) I/O occurs under
the covers. When facing this, you may be tempted to just wrap the
blocking call inside a Future and work with that instead, but this
strategy is too simple: you are quite likely to find bottlenecks or
run out of memory or threads when the application runs under increased
load.
The non-exhaustive list of adequate solutions to the “blocking
problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router), making sure to configure a thread pool which is either
dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded
number of tasks of this nature will exhaust your memory or thread
limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the
hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they
occur as actor messages.
The first possibility is especially well-suited for resources which
are single-threaded in nature, like database handles which
traditionally can only execute one outstanding query at a time and use
internal synchronization to ensure this. A common pattern is to create
a router for N actors, each of which wraps a single DB connection and
handles queries as sent to the router. The number N must then be tuned
for maximum throughput, which will vary depending on which DBMS is
deployed on what hardware.
I have a use case as follows:
I need to split my computation among multiple threads and all threads needs to send back the results to master thread in quick time.
Flow
There is a search query which is entered by user
Query comes to akka
Query needs to be distributed among number of akka actors .
Each akka actor will do some kind of processing and return a results to parent actor
But each akka actor is single threaded. And I have multiple queries coming at the same time.
How can I serve multiple queries in quick time without making any query to wait on its computation.
Is akka suitable for this use case? If yes how can I model it?
Akka is perfectly suited to this kind of application!
t is true that each actor is single threaded. That is, each actor processes its own messages sequentially (one at a time) and synchronously (single threaded). But you're free to create as many actors as you'd like, and those actors operate completely asynchronously from each other.
In other words, you can spawn a new actor for each query request. Each actor handles a single request in a safe, single threaded fashion, but as a whole you're handling multiple queries simultaneously.
For the use case you've described, I'd look into using akka-io for your IO layer and something like the balancing dispatcher pattern to divide the queries among workers.