I have a use case as follows:
I need to split my computation among multiple threads and all threads needs to send back the results to master thread in quick time.
Flow
There is a search query which is entered by user
Query comes to akka
Query needs to be distributed among number of akka actors .
Each akka actor will do some kind of processing and return a results to parent actor
But each akka actor is single threaded. And I have multiple queries coming at the same time.
How can I serve multiple queries in quick time without making any query to wait on its computation.
Is akka suitable for this use case? If yes how can I model it?
Akka is perfectly suited to this kind of application!
t is true that each actor is single threaded. That is, each actor processes its own messages sequentially (one at a time) and synchronously (single threaded). But you're free to create as many actors as you'd like, and those actors operate completely asynchronously from each other.
In other words, you can spawn a new actor for each query request. Each actor handles a single request in a safe, single threaded fashion, but as a whole you're handling multiple queries simultaneously.
For the use case you've described, I'd look into using akka-io for your IO layer and something like the balancing dispatcher pattern to divide the queries among workers.
Related
In my J2EE web app, I have to send a count for every Web API call to an isolated thread for counting the number of calls. Possibilities include:
a) Use an atomic long. I think that would cause contention in case I have millions of calls in a minute. As, all the threads will try to update a single variable.
b) Use a shared queue. Every request processing thread will insert into the queue, and the dedicated counter thread will dequeue from that queue and increment the count.
c) Use actor model, say using Akka library. Send an asynchronous message to the actor, and that will add it up to the count.
My question is how does method (b) compare to (c). What are the pros and cons, and how they are different at low level?
In your case I believe Actor model should be a better option.
Pros with Akka-
Actor Model with Akka will take care of the thread management and it is easy to implement
Further, in future if you want to implement counter for different kind of request, you can simply add a new actor for that.
There is a similar question at-
When to use actors instead of messaging solutions such as WebSphere MQ or Tibco Rendezvous?
I am trying to understand when to use Akka Futures and found this article to be a little bit more helpful than the main Akka docs. So it looks like Akka Futures do exactly the same thing as Java 7 Futures. So I ask:
Outside the context of an actor system, what benefits do Akka Futures have over Java Futures? When to use each?
Within the context of an actor system, why ever use an Akka Future? Aren't all actor-to-actor messages asynchronous, concurrent and non-blocking?
Akka Futures implement asynchronous way of communication, while Java7 Futures implement synchronous approach. Yes they do the same thing - communication - but in quite different way.
Producer-Consumer pair can interact in two ways: synchronous and asynchronous. Synchronous way assumes the consumer has its own thread and performs a blocking operation to get next produced message, e.g. BlockingQueue.take(). In asynchronous approach, consumer does not own a thread, it is just an object with at least two methods: to store a message and to process it. Producer calls the store method, just like it calls Queue.put(m) in synchronous approach, but this method also initiates execution of the consumer's processing method on a common thread pool.
UPDT
As for the 2nd question (why ever use an Akka Future):
Future creation looks (and is) simpler than Actor's; code for a chain of Futures is more compact and more demonstrable than that of Actors.
Note however, a Future can pass only a single value (message) while an Actor can handle a sequence of messages. But sequences can be handled with Akka Streams. So the question arise: why ever use Akka Actors? I invite more experienced developers to answer this question. Generally, I think if your task can be solved with Futures, then use Futures, else if with Streams, use Streams, else if with Akka Actors, then use Actors, else look for another framework.
For the first part of your question, I agree with Alexei Kaigorodov's answer.
For the second part of your question:
It is useful to use a Future internally when actor responses need to be combined in a very specific way. For example, let's say that the Master actor needs to perform several blocking database queries and then aggregate their results, and so Master sends each query to a Worker and will then aggregate the responses. If the query results can be aggregated in any order (e.g. Master is just summing row counts or whatever) then it makes sense for Worker to send its results to Master via a callback. However, if the results need to be combined in a very specific order then it is easier for each Worker to immediately return a Future and for Master to then go about manipulating these Futures in the correct order. This could be done via callbacks as well, but then Master would need to figure out which query result is which to put them in the correct order and it will be much more difficult to optimize the code (e.g. if the results of query1 can be immediately aggregated with the results of query2 then by using a Future this logic can go directly into the dispatch code where the identities of all queries is already known, whereas using a callback would require Master to identify the query result and also determine if it can aggregate the query with any other query results that have been returned).
I have to write heavy load system, with pretty easy task to do. So i decided to split this tasks into multiple workers in different locations (or clouds). To communicate i want to use rabbitmq queue.
In my system there will be two kinds of software nodes: schedulers and workers. Schedulers will take user input from queue_input, split it into smaller task and put this smaller task into workers_queue. Workers reads this queue and 'do the thing'. I used round-robbin load balancing here - and all works pretty well, as long, as some worker crashed. Then i loose information about task completion (it's not allowed to do single operation twice, each task contains a pack of 50 iterations of doing worker-code with diffirent data).
I consider something like technical_queue - another channel to scheduler-worker communication, and I wonder, how to design it in a good way. I used tutorials from rabbitmq page, so my worker thread looks like :
while(true) {
message = consume(QUEUE,...);
handle(message); //do 50 simple tasks in loop for data in message
}
How can i handle second queue? Another thread we some while(true) {} loop?, or is there a better sollution to this? Maybe should I reuse existing queue with topic exchange? (but i wanted to have independent way of communication, while handling the task, which may take some time.
You should probably take a look at spring-amqp (doc). I hate to tell you to add a layer but that spring library takes care of the threading issues and management of threads with its SimpleMessageListenerContainer. Each container goes to a queue and you can specify # of threads (ie workers) per queue.
Alternatively you can make your own using an ExecutorService but you will probably end up rewriting what SimpleMessageListenerContainer does. Also you just could execute (via OS or batch scripts) more processes and that will add more consumers to each queue.
As far as queue topology is concerned it is entirely dependent on business logic/concerns and generally less on performance needs. More often you had more queues for business reasons and more workers for performance reasons but if a queue gets backed up with the same type of message considering giving that type of message its own queue. What your describing sounds like two queues with multiple consumer on your worker queue.
Other than the threading issue and queue topology I'm not entirely sure what else you are asking.
I would recommend you create a second queue consumer
consumer1 -> queue_process
consumer2 -> queue_process
Both consumers should make listening to the same queue.
Greetings I hope will help
I am also thinking of integrating the disruptor pattern in our application. I am a bit unsure about a few things before I start using the disruptor
I have 3 producers, mainly a FIX thread which de-serialises the requests. Another thread which continously modifies order price as the market moves. Also we have one more thread which is responsible for de-serialising the requests sent from a GUI application. All three threads currently write to a Blocking Queue (hence we see a lot of contention on the queue)
The disruptor talks about a Single writer principle and from what I have read that approach scales the best. Is there any way we could make the above three threads obey the single writer principle?
Also in a typical request/response application, specially in our case we have contention on an in memory cache, as we need to lock the cache when we update the cache with the response, whilst a request might be happening for the same order. How do we handle this through the disruptor, i.e. how do I tie up a response to a particular request? Can I eliminate the lock on the cache if yes how?
Any suggestions/pointers would be highly appreciated. We are currently using Java 1.6
I'm new to distruptor and am trying to understand as much usecases as possible. I have tried to answer your questions.
Yes, Disruptor can be used to sequence calls from multiple
producers. I understand that all 3 threads try to update the state
of a shared object. And a single consumer which takes necessary action on the shared object. Internally you can have the single consumer delegate calls to the appropriate single threaded handler based on responsibility. The
The Disruptor exactly does this. It sequences the calls such that
the state is accessed only by a thread at a time. If there's a specific order in which the event handlers are to be invoked, set up the memory barrier. The latest version of Disruptor has a DSL that lets you setup the order easily.
The Cache can be abstracted and accessed through the Disruptor. At a time, only a
Reader or a Writer would get access to the cache, since all calls to
the cache are sequential.
I've searched the site a bit for help understanding this, but haven't found anything super clear, so I thought I'd post my use case and see if anybody could shed some light.
I have a question about the scaling of jvm threads vs os threads when used in akka for io operations. From the akka site:
Akka supports dispatchers for both event-driven lightweight threads, allowing creation of millions threads on a single workstation, and thread-based Actors, where each dispatcher is bound to a dedicated OS thread.
The event-based Actors currently consume ~600 bytes per Actor which means that you can create more than 6.5 million Actors on 4 G RAM.
In this context, can you all help me understand how that matters on a workstation with only 1 processor (for simplicity). So, for my example use case, I want to take a list of say 1000 'Users' and then go query a database (or several) for various information about each user. So if I were to dispatch each of these 'get' tasks to an actor, and that actor is going to do IO, wouldn't that actor block based on the os thread limit for the workstation?
How does the akka actor model give me lift in a scenario like this? I know that I am probably missing something as I am not wildly knowledgeable on the interworkings of vm threads vs os threads, so if one of the smart folks here could spell it out for me, that would be great.
If I use Futures, don't I need to use await() or get() to block and wait for the reply?
In my use case, regardless of actors, would it end up just 'feeling' like I'm making 1000 sequential database requests?
If code snips are useful in helping me understand this, Java would be preferred as I am still coming up to speed on scala syntax - but a nice clear textual explanation of how these millions of threads can interoperate on a single processor machine while doing database IO would be fine too.
It is really hard to figure out what you are actually asking here, but here are some pointers:
If you are running on a modern JVM, there is typically a one-to-one relationship between Java threads and OS threads. (IIRC, Solaris allows you to do this differently ... but that's the exception.)
The amount of real parallelism you will get using threads, or anything built on top of threads is limited by the number of processors / cores that are available to the application. Beyond that, you will find that not all threads are actually executing at any given instant.
If you have 1000 Actors all trying to access the database "at the same time", then most of them will actually be waiting on the database itself, or on the thread scheduler. Whether this amounts to making 1000 sequential requests (i.e. strict serialization) will depend on the database and the queries / updates that the actors are doing.
The bottom line is that a computer system has hard limits on the resources available for doing stuff; e.g. number of processors, speed of processors, memory bandwidth, disc access times, network bandwidth, etc. You can design an application to be smart about the way it uses available resources, but you can't get it to use more resources than there actually are.
On reading the text that you quoted, it seems to me that it is talking about two different kinds of actors:
Thread-based actors have a 1 to 1 relationship with threads. There's no way you could have millions of this kind of actor in 4Gb memory.
Event-based actors work differently. Instead of having a thread at all times, they would mostly be sitting in a queue waiting for an event to happen. When that happened, an event processing thread would grab the actor from the queue and execute the "action" associated with the event. When the action finished, the thread moves onto another actor / event pair.
The quoted text is saying that the memory overhead of an event-based actor is ~600 bytes. They don't include the event thread ... because the event thread is shared by multiple actors.
Now I'm not an expert on Scala / Actors, but it is pretty obvious that there are certain things that you should avoid when using event-based actors. For instance, you should probably avoid talking directly to an external database because that is liable to block the event processing thread.
I think there may be a typo there. I think they meant to say:
Akka supports dispatchers for both event-driven lightweight actors,
allowing creation of millions actors on a single workstation, and thread-based Actors, where each actor is bound to a dedicated OS thread.
The event-driven actors use a thread pool - all of the (potentially millions of) actors share the same pool of threads. I'm not that familiar with Akka actors but generally you would not want to do blocking I/O with event-driven actors, otherwise you could cause starvation.