I have an actor called a TaskRunner. The tasks can take up to 1 minute to run. Because of the library I use there can only be one actor per jvm/node. I have 1000 of these nodes across various machines.
I would like to distribute tasks to these nodes using various rules but the most important one is:
Never queue tasks in a TaskRunner node's mailbox, always wait until a TaskRunner is free before sending it a task
The way I was thinking of doing this is have an actor on another node (lets call this the Scheduler actor) listen to registrations from the TaskRunner nodes and keep an internal state of what has been sent to where.
Presumably if I did this I could only ever have one instance of this Scheduler actor because if there was more than one they wouldn't know which TaskRunner nodes were currently busy and so we would get tasks in the queue.
Does this mean I should be using a Cluster Singleton for the Scheduler actor?
Is there a better way to achieve my goal?
I would say you need:
dispatcher actor (cluster singleton) who send task to actor from pool of idle actors
your TaskRunner actor should have two states: running, and idle. In idle state it should register itself regularly to dispatcher actor (notifying that it is idle). Regularly, because of possible state losing by dispatcher in case of node shutdown and move singleton to another node.
dispatcher itself keep list of idle actors. When new task need to be done and list is not empty, worker is taken from the list and task is sent (worker could be removed from list immediately, but safe to work with Ack to be sure that task is taken for processing, or re-send to another worker if Ack is timed out)
Given your requirement, rather than building everything from scratch, you might want to consider adapting Lightbend's distributed-workers template which employs a pull model. It primarily consists of 1) a master cluster singleton that maintains state of workers, and, 2) an actor system of workers which register and pull work from the master singleton actor.
I adapted a repurposed version of the template for a R&D project in the past and it delivered the work-pulling functionality as advertised. Note that the template uses the retired Activator (which can be easily detached or replaced with sbt from the main code). It also does distributed pub-sub and persistence journal, which you can elect to exclude if not needed. Its source code is available at GitHub.
Reffering to your approach of singleton master and multiple workers,there can be a situation where your master is over loaded with task to schedule, which may result in more time to schedule the task to the workers.
So Instead of making master as Cluster singleton, you can have multiple masters having subset of workers assigned to them.
The distribution of work to different master can be done through cluster sharding based on sharding key.
Akka provides cluster sharding, you can refer that.
And for making your master fault tolerant, you can always have the persistent actors as masters.
Related
I have service in place which monitors key expiry topic __keyevent#*__:expired in redis. I am running 3 instance of the service. Which means 3 message listener.
The RedisKeyExpirationListener setup based on the suggestion in this solution https://developpaper.com/implementation-code-of-expired-key-monitoring-in-redis-cluster/
The above solution suggests using Distributed redis lock to make sure there is parallel processing i.e. same event not being processed again by a different node. Is there a different solution to make sure that Redis passes a event to just 1 node, to have true parallel processing across the 3 nodes rather than same event being processed with different nodes.
I know how to implement distributed lock with redis but want to understand if there are precise settings to enable or make sure the event is sent to only 1 active messagelistener and not all the keyexpirationlisteners ??
I have the following use case on a Spring-based Web application:
I need to apply the Competing Consumers EIP with the following twists: the messages in the queue are actually split tasks belonging to the same job. Therefore, I need to properly track when all tasks of a job get completed and their completion status in order to save the scenario either as COMPLETED or FAILED, log the outcome and notify by e.g. e-mail the users accordingly
So, given the requirements I described above, my question is:
Can this be done with RabbitMQ and if yes how?
I created a quick gist to show a very crude example of how one could do it. In this example, there is one producer and 2 consumers, 2 queues, one for sending by the producer ("SEND"), consumed by the consumers, and vice versa, consumers publish to the "RECV" queue and is consumed by the producer.
Now bear in mind this is a pretty crude example, as the Producer in that case send simply one job (a random amount of tasks between 0 and 5), and block until the job is done. A way to circumvent this would be to store in a Map a job id and the number of tasks, and every time check that the number of tasks done reported per job id.
What you are trying to do is beyond the scope of RabbitMQ. RabbitMQ is for sending and receiving messages with ability to queue them.
It can't track your job tasks for you.
You will need to have a "Job Storage" service. Whenever your consumer finishes the task, its updates the Job Storage service, marking task as done. Job storage service knows about how many tasks are in the job, and when last task is done, completes jobs as succeeded. There in this service, you will also implement all your other business logic, such as when to treat job as failed.
I have an Java-Akka based application where one Akka actor tells another Akka actor to do a certain jobs and it starts doing the job in the command prompt but If I gave him 10 jobs it starts all the jobs at a time in 10 command prompt.
If i'll be having 100+ jobs than my system will be hanged.
So how can I make my application to do the job 1 at a time and all the other jobs should will get the CPU in FIFO(first in first out) manner.
The question is not quite clear but I try to answer with my understanding.
So, it looks like you use actor as a job dispatcher which translates job messages to calls for some "job executor system". Each incoming message is translated to some call.
If this call is synchronous (which smells when working with actors of course but just for understanding) then no problem in your case, your actor waits until call is complete, then proceed with next message in its mailbox.
If that call is asynchronous which I guess what you have then all the messages will be handled one by one without waiting for each other.
So you need to throttle the messages handling in order to have at most one message being processed at a time. This can be archived by "pull" pattern which is described here.
You basically allocate one master actor which has a queue with incoming messages (jobs) and one worker actor which asks for job when it is free of jobs. Be careful with the queue in master actor - you probably don't want it to grow too much, think about monitoring and applying back-pressure, which is another big topic covered by akka-stream.
I have to write heavy load system, with pretty easy task to do. So i decided to split this tasks into multiple workers in different locations (or clouds). To communicate i want to use rabbitmq queue.
In my system there will be two kinds of software nodes: schedulers and workers. Schedulers will take user input from queue_input, split it into smaller task and put this smaller task into workers_queue. Workers reads this queue and 'do the thing'. I used round-robbin load balancing here - and all works pretty well, as long, as some worker crashed. Then i loose information about task completion (it's not allowed to do single operation twice, each task contains a pack of 50 iterations of doing worker-code with diffirent data).
I consider something like technical_queue - another channel to scheduler-worker communication, and I wonder, how to design it in a good way. I used tutorials from rabbitmq page, so my worker thread looks like :
while(true) {
message = consume(QUEUE,...);
handle(message); //do 50 simple tasks in loop for data in message
}
How can i handle second queue? Another thread we some while(true) {} loop?, or is there a better sollution to this? Maybe should I reuse existing queue with topic exchange? (but i wanted to have independent way of communication, while handling the task, which may take some time.
You should probably take a look at spring-amqp (doc). I hate to tell you to add a layer but that spring library takes care of the threading issues and management of threads with its SimpleMessageListenerContainer. Each container goes to a queue and you can specify # of threads (ie workers) per queue.
Alternatively you can make your own using an ExecutorService but you will probably end up rewriting what SimpleMessageListenerContainer does. Also you just could execute (via OS or batch scripts) more processes and that will add more consumers to each queue.
As far as queue topology is concerned it is entirely dependent on business logic/concerns and generally less on performance needs. More often you had more queues for business reasons and more workers for performance reasons but if a queue gets backed up with the same type of message considering giving that type of message its own queue. What your describing sounds like two queues with multiple consumer on your worker queue.
Other than the threading issue and queue topology I'm not entirely sure what else you are asking.
I would recommend you create a second queue consumer
consumer1 -> queue_process
consumer2 -> queue_process
Both consumers should make listening to the same queue.
Greetings I hope will help
I'm new to Akka, and I apologize in advance if this is a basic question. I'm not sure how to use actors to implement the following scenario, or if it's even possible (or desirable).
I have a number of actors (i.e. producers) responsible for maintaining certain pieces of state concurrently, all of which notify another actor (i.e. consumer) when changes occur.
The consumer is to run a certain task repeatedly, a task that requires state from all producers just to start. It must also respond to changes in the state when it receives messages from the producers.
Before considering Akka, I'd kind of rolled my own simple actor model, with each actor running in its own thread. The run() methods would monitor an event queue, so I could have the consumer continually do something similar to this:
while not done
poll the event queue
if something was polled
process the event
if all state is available
do one step of the long running task
The continual polling of the event queue didn't sit well with me, but it at least made progress on the long running task between events.
Is there a best way to use Akka actors to implement this? I could implement a "heartbeat" which sends a message to the consumer (or sent internally to the consumer by itself) to do another step of the long running task, but I don't like the thought of having it be scheduled since the steps in the long running task aren't uniform in duration. I don't want to queue up iterations which keep it too busy so not to quickly respond to messages from the producer. But I also don't want to schedule it too infrequently, so it's sitting idle when it could be making progress...
Or would it be more appropriate to use a Dataflow model of concurrency for this (something I've only ready about)? The consumer can't start until the state is all bound, so it seems natural to define the process in terms of Dataflow variables. But if Dataflow variables can only be bound once, it doesn't seem appropriate for getting repeated updates in state from the consumers.
You can have the produces publish the changes to an Akka EventBus, and have the consumer register to listen for these events, then when it has all it needs, it can process the full chunk, or spawn a new actor that processes the full chunks.