I am reading the Akka (Java lib) docs and need clarification on some of their own proclaimed Akka/Actor Best Practices.
Actors should not block (i.e. passively wait while occupying a Thread) on some external entity...The blocking operations should be
done in some special-cased thread which sends messages to the actors which shall act on them.
So what does a code example of this look like in Akka/Java? If an Actor isn't an appriote place to put code that has to block, then what does satisfy the definition of "some special-cased thread"?
Do not pass mutable objects between actors. In order to ensure that, prefer immutable messages.
I'm familiar with how to make immutable classes (no public setters, no public fields, make the class final, etc.). But does Akka have its own definition of an "immutable class", and if so, what is it?
Top-level actors are the innermost part of your Error Kernel...
I don't even know what this means! I understand what they mean by "top-level" actors (highest in the actor/manager/supervisor hierarchy), but what's an "Error Kernel", and how does it relate to actors?
I am able to answer only the first question (and in future, please place only one question in a post).
Consider, for example, a database connection, which is inherently blocking. In order to allow actors to connect to a database, programmer should create a dedicated thread (or a thread pool) with a queue of database requests. A request contains a database statement and a reference to the actor which is to receive the result. The dedicated thread reads requests in a loop, accesses the database, sends the result to the referenced actor etc. The request queue is blocking - when there are no requests, the connection thread is blocked in the queue.take() operation.
So the access to a database is split in two actors - one places a request to the queue, and the other handles the result.
UPDATE: Java code sketch (I am not strong in Scala).
class Request {
String query;
ActorRef handler;
}
class DatabaseConnector implements Runnable {
LinkedBlockingQueue<Request> queue=new LinkedBlockingQueue<Request>();
Thread t = new Thread(this);
{t.start();}
public void sendRequest(Request r) {
queue.put(r);
}
public void run() {
for (;;) {
Request r=queue.take();
ResultSet res=doBlockingCallToJdbc(r.query);
r.handler.sendOneWay(res);
}
}
Here is the answer for your second question. Right from the Akka Doc:
If one actor carries very important data (i.e. its state shall not be
lost if avoidable), this actor should source out any possibly
dangerous sub-tasks to children it supervises and handle failures of
these children as appropriate. Depending on the nature of the
requests, it may be best to create a new child for each request, which
simplifies state management for collecting the replies. This is known
as the “Error Kernel Pattern” from Erlang.
So the phrase you talking about means that these actors are the "last line of defence" from errors in your supervision hierarchy, so they should be strong and powerful guys (commandos) instead of some weak workers. And the less commandos you have - the easier it would be managing them and avoid mess at the top-level. Precisely saying, the count of commando's should be near to the count of business protocols you have (moving to the superheroes - let's say one for IronMan, one for Hulk etc.)
This document also has a good explanation about how to manage blocking operations.
Speaking of which
If an Actor isn't an appriote place to put code that has to block then what does satisfy the definition of "some special-cased thread
Actor definetely doesn't, because Akka guarantees only sequentiality, but your message may be processed on any thread (it just picks-up a free thread from the pool), even for single actor. Blocking operations are not recommended there (at least in same thread-pool with normal) because they may lead to performance problems or even deadlocks. See explanation for Spray (it's based on Akka) for instance : Spray.io: When (not) to use non-blocking route handling?
You may think of it like akka requires to interact only with asynchronous API. You may consider Future for converting sync to async - just send response from your database as a message to the actor. Example for scala:
receive = { //this is receiving method onReceive
case query: Query => //query is message safely casted to Query
Future { //this construction marks a peace of code (handler) which will be passed to the future
//this code will be executed in separate thread:
doBlockingCallToJdbc(query)
} pipeTo sender //means do `sender ! futureResult` after future's completion
}
}
Other approaches are described in the same document (Akka Doc)
Related
I have to write into a file based on the incoming requests. As multiple requests may come simultaneously, I don't want multiple threads trying to overwrite the file content together, which may lead into losing some data.
Hence, I tried collecting all the requests' data using a instance variable of PublishSubject. I subscribed publishSubject during init and this subscription will remain throughout the life-cycle of application. Also I'm observing the same instance on a separate thread (provided by Vertx event loop) which invokes the method responsible for writing the file.
private PublishSubject<FileData> publishSubject = PublishSubject.create();
private void init() {
publishSubject.observeOn(RxHelper.blockingScheduler(vertx)).subscribe(fileData -> writeData(fileData));
}
Later during request handling, I call onNext as below:
handleRequest() {
//do some task
publishSubject.onNext(fileData);
}
I understand that, when I call onNext, the data will be queued up, to be written into the file by the specific thread which was assigned by observeOn operator. However, what I'm trying to understand is
whether this thread gets blocked in WAITING state for only this
task? Or,
will it be used for other activities also when no file
writing happens?
I don't want to end up with one thread from the vertx event loop wasted in waiting state for going with this approach. Also, please suggest any better approach, if available.
Thanks in advance.
Actually RxJava will do it for you, by definition onNext() emissions will act in serial fashion:
Observables must issue notifications to observers serially (not in parallel). They may issue these notifications from different threads, but there must be a formal happens-before relationship between the notifications. (Observable Contract)
So as long as you will run blocking calls inside the onNext() at the subscriber (and will not fork work to a different thread manually) you will be fine, and no parallel writes will be happen.
Actually, you're worries should come from the opposite direction - Backpressure.
You should choose your backpressure strategy here, as if the requests will come faster then you will process them (writing to file) you might overflow the buffer and get into troubles. (consider using Flowable and choose you're backpressure strategy according to your needs.
Regarding your questions, that depends on the Scheduler, you're using RxHelper.blockingScheduler(vertx) which seems like your custom code, so I can't tell, if the scheduler is using shared thread in work queue fashion then it will not stay idle.
Anyhow, Rx will not determine this for you, the scheduler responsibility is to assign the work to some thread according to its logic.
I have an Actor that - in its very essence - maintains a list of objects. It has three basic operations, an add, update and a remove (where sometimes the remove is called from the add method, but that aside), and works with a single collection. Obviously, that backing list is accessed concurrently, with add and remove calls interleaving each other constantly.
My first version used a ListBuffer, but I read somewhere it's not meant for concurrent access. I haven't gotten concurrent access exceptions, but I did note that finding & removing objects from it does not always work, possibly due to concurrency.
I was halfway rewriting it to use a var List, but removing items from Scala's default immutable List is a bit of a pain - and I doubt it's suitable for concurrent access.
So, basic question: What collection type should I use in a concurrent access situation, and how is it used?
(Perhaps secondary: Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread?)
(Tertiary: In Scala, what collection type is best for inserts and random access (delete / update)?)
Edit: To the kind responders: Excuse my late reply, I'm making a nasty habit out of dumping a question on SO or mailing lists, then moving on to the next problem, forgetting the original one for the moment.
Take a look at the scala.collection.mutable.Synchronized* traits/classes.
The idea is that you mixin the Synchronized traits into regular mutable collections to get synchronized versions of them.
For example:
import scala.collection.mutable._
val syncSet = new HashSet[Int] with SynchronizedSet[Int]
val syncArray = new ArrayBuffer[Int] with SynchronizedBuffer[Int]
You don't need to synchronize the state of the actors. The aim of the actors is to avoid tricky, error prone and hard to debug concurrent programming.
Actor model will ensure that the actor will consume messages one by one and that you will never have two thread consuming message for the same Actor.
Scala's immutable collections are suitable for concurrent usage.
As for actors, a couple of things are guaranteed as explained here the Akka documentation.
the actor send rule: where the send of the message to an actor happens before the receive of the same actor.
the actor subsequent processing rule: where processing of one message happens before processing of the next message by the same actor.
You are not guaranteed that the same thread processes the next message, but you are guaranteed that the current message will finish processing before the next one starts, and also that at any given time, only one thread is executing the receive method.
So that takes care of a given Actor's persistent state. With regard to shared data, the best approach as I understand it is to use immutable data structures and lean on the Actor model as much as possible. That is, "do not communicate by sharing memory; share memory by communicating."
What collection type should I use in a concurrent access situation, and how is it used?
See #hbatista's answer.
Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread
The second (though the thread on which messages are processed may change, so don't store anything in thread-local data). That's how the actor can maintain invariants on its state.
I am trying to understand when to use Akka Futures and found this article to be a little bit more helpful than the main Akka docs. So it looks like Akka Futures do exactly the same thing as Java 7 Futures. So I ask:
Outside the context of an actor system, what benefits do Akka Futures have over Java Futures? When to use each?
Within the context of an actor system, why ever use an Akka Future? Aren't all actor-to-actor messages asynchronous, concurrent and non-blocking?
Akka Futures implement asynchronous way of communication, while Java7 Futures implement synchronous approach. Yes they do the same thing - communication - but in quite different way.
Producer-Consumer pair can interact in two ways: synchronous and asynchronous. Synchronous way assumes the consumer has its own thread and performs a blocking operation to get next produced message, e.g. BlockingQueue.take(). In asynchronous approach, consumer does not own a thread, it is just an object with at least two methods: to store a message and to process it. Producer calls the store method, just like it calls Queue.put(m) in synchronous approach, but this method also initiates execution of the consumer's processing method on a common thread pool.
UPDT
As for the 2nd question (why ever use an Akka Future):
Future creation looks (and is) simpler than Actor's; code for a chain of Futures is more compact and more demonstrable than that of Actors.
Note however, a Future can pass only a single value (message) while an Actor can handle a sequence of messages. But sequences can be handled with Akka Streams. So the question arise: why ever use Akka Actors? I invite more experienced developers to answer this question. Generally, I think if your task can be solved with Futures, then use Futures, else if with Streams, use Streams, else if with Akka Actors, then use Actors, else look for another framework.
For the first part of your question, I agree with Alexei Kaigorodov's answer.
For the second part of your question:
It is useful to use a Future internally when actor responses need to be combined in a very specific way. For example, let's say that the Master actor needs to perform several blocking database queries and then aggregate their results, and so Master sends each query to a Worker and will then aggregate the responses. If the query results can be aggregated in any order (e.g. Master is just summing row counts or whatever) then it makes sense for Worker to send its results to Master via a callback. However, if the results need to be combined in a very specific order then it is easier for each Worker to immediately return a Future and for Master to then go about manipulating these Futures in the correct order. This could be done via callbacks as well, but then Master would need to figure out which query result is which to put them in the correct order and it will be much more difficult to optimize the code (e.g. if the results of query1 can be immediately aggregated with the results of query2 then by using a Future this logic can go directly into the dispatch code where the identities of all queries is already known, whereas using a callback would require Master to identify the query result and also determine if it can aggregate the query with any other query results that have been returned).
I need to use memcached Java API in my Scala/Akka code. This API gives you both synchronous and asynchronous methods. The asynchronous ones return java.util.concurrent.Future. There was a question here about dealing with Java Futures in Scala here How do I wrap a java.util.concurrent.Future in an Akka Future?. However in my case I have two options:
Using synchronous API and wrapping blocking code in future and mark blocking:
Future {
blocking {
cache.get(key) //synchronous blocking call
}
}
Using asynchronous Java API and do polling every n ms on Java Future to check if the future completed (like described in one of the answers above in the linked question above).
Which one is better? I am leaning towards the first option because polling can dramatically impact response times. Shouldn't blocking { } block prevent from blocking the whole pool?
I always go with the first option. But i am doing it in a slightly different way. I don't use the blocking feature. (Actually i have not thought about it yet.) Instead i am providing a custom execution context to the Future that wraps the synchronous blocking call. So it looks basically like this:
val ecForBlockingMemcachedStuff = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(100)) // whatever number you think is appropriate
// i create a separate ec for each blocking client/resource/api i use
Future {
cache.get(key) //synchronous blocking call
}(ecForBlockingMemcachedStuff) // or mark the execution context implicit. I like to mention it explicitly.
So all the blocking calls will use a dedicated execution context (= Threadpool). So it is separated from your main execution context responsible for non blocking stuff.
This approach is also explained in a online training video for Play/Akka provided by Typesafe. There is a video in lesson 4 about how to handle blocking calls. It is explained by Nilanjan Raychaudhuri (hope i spelled it correctly), who is a well known author for Scala books.
Update: I had a discussion with Nilanjan on twitter. He explained what the difference between the approach with blocking and a custom ExecutionContext is. The blocking feature just creates a special ExecutionContext. It provides a naive approach to the question how many threads you will need. It spawns a new thread every time, when all the other existing threads in the pool are busy. So it is actually an uncontrolled ExecutionContext. It could create lots of threads and lead to problems like an out of memory error. So the solution with the custom execution context is actually better, because it makes this problem obvious. Nilanjan also added that you need to consider circuit breaking for the case this pool gets overloaded with requests.
TLDR: Yeah, blocking calls suck. Use a custom/dedicated ExecutionContext for blocking calls. Also consider circuit breaking.
The Akka documentation provides a few suggestions on how to deal with blocking calls:
In some cases it is unavoidable to do blocking operations, i.e. to put
a thread to sleep for an indeterminate time, waiting for an external
event to occur. Examples are legacy RDBMS drivers or messaging APIs,
and the underlying reason is typically that (network) I/O occurs under
the covers. When facing this, you may be tempted to just wrap the
blocking call inside a Future and work with that instead, but this
strategy is too simple: you are quite likely to find bottlenecks or
run out of memory or threads when the application runs under increased
load.
The non-exhaustive list of adequate solutions to the “blocking
problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router), making sure to configure a thread pool which is either
dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded
number of tasks of this nature will exhaust your memory or thread
limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the
hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they
occur as actor messages.
The first possibility is especially well-suited for resources which
are single-threaded in nature, like database handles which
traditionally can only execute one outstanding query at a time and use
internal synchronization to ensure this. A common pattern is to create
a router for N actors, each of which wraps a single DB connection and
handles queries as sent to the router. The number N must then be tuned
for maximum throughput, which will vary depending on which DBMS is
deployed on what hardware.
I am using protobuf for implementing a communication protocol between a Java application and a native application written in C++. The messages are event driven: when an event occurs in the C++ application a protobuf message is conructed and sent.
message MyInterProcessMessage {
int32 id = 1;
message EventA { ... }
message EventB { ... }
...
}
In Java I receive on my socket an object of the class: MyInterProcessMessageProto. From this I can get my data very easily since they are encapsulated into each other: myMessage.getEventA().getName();
I am facing two problems:
How to delegate the processing of the received messages?
Because, analysising the whole message and distinguishing the different event types and the actions they imply resulted in a huge and not maintainable method with many if-cases.
I would like to find a pattern, where I can preserve the messages and not only apply them, but also undo them, like the Command pattern is used to implement this.
My first approach would be: create different wrapper classes for each event with a specified apply() and undo() method and delegate the job this way.
However I am not sure if this is the right way or whether there are not any better solutions.
To clarify my application:
The Java application models a running Java Virtual Machine and holds information, for instance Threads, Monitors, Memory, etc.
Every event changes the current state of the modeled JVM. For instance, a new thread was launched, another thread goes into blocking state, memory was freed etc. In the same meaning the events are modeled: ThreadEvent, MemoryEvent, etc.
This means, the messages have to be processed sequentially. In order to iterate back to previous states of the JVM, I would like to implement this undo functionality.
For undo I already tried. clearAllStates, apply Events until Event #i.
Unfortunately with 20.000+ events this is total inefficient.
To provide a tailored answer it would be good to know what you're doing with received messages, if they can be processed concurrently or not, and how an undo impacts the processing of messages received after and undo'ed message.
However, here's a generic suggestion: A typical approach is to delegate received messages to a queue-like handler class, which usually runs in an own thread (to let the message receiver get ready for the next incoming message as soon as possible) and sequentially processes received messages. You could use a stack-like class to keep track of processed messages for the sake of the undo feature. You could also use specific queues and stacks for different event types.
Basically this resembles the thread pool pattern.