RabbitMQ how to split jobs to tasks and handle results

RabbitMQ how to split jobs to tasks and handle results - java

I have the following use case on a Spring-based Web application:
I need to apply the Competing Consumers EIP with the following twists: the messages in the queue are actually split tasks belonging to the same job. Therefore, I need to properly track when all tasks of a job get completed and their completion status in order to save the scenario either as COMPLETED or FAILED, log the outcome and notify by e.g. e-mail the users accordingly
So, given the requirements I described above, my question is:
Can this be done with RabbitMQ and if yes how?

I created a quick gist to show a very crude example of how one could do it. In this example, there is one producer and 2 consumers, 2 queues, one for sending by the producer ("SEND"), consumed by the consumers, and vice versa, consumers publish to the "RECV" queue and is consumed by the producer.
Now bear in mind this is a pretty crude example, as the Producer in that case send simply one job (a random amount of tasks between 0 and 5), and block until the job is done. A way to circumvent this would be to store in a Map a job id and the number of tasks, and every time check that the number of tasks done reported per job id.

What you are trying to do is beyond the scope of RabbitMQ. RabbitMQ is for sending and receiving messages with ability to queue them.
It can't track your job tasks for you.
You will need to have a "Job Storage" service. Whenever your consumer finishes the task, its updates the Job Storage service, marking task as done. Job storage service knows about how many tasks are in the job, and when last task is done, completes jobs as succeeded. There in this service, you will also implement all your other business logic, such as when to treat job as failed.

Related

Sending a message after several other messages have completed without utilizing an external store?

I have an application which should use JMS to queue several long running tasks asynchronously in response to a specific request. Some of these tasks might complete within seconds while others might take a longer time to complete. The original request should already complete after all the tasks have been started (i.e. the message to start the task has been queued) - i.e. I don't want to block the request while the tasks are being executed.
Now, however, I would like to execute another action per request once all of the messages have been processed successfully. For this, I would like to send another message to another queue - but only after all messages have been processed.
So what I am doing is a bit similar to a reply-response pattern, but not exactly: The responses of multiple messages (which were queued in the same transaction) should be aggregated and processed in a single transaction once they are all available. Also, I don't want to "block" the transaction enqueuing the messages by waiting for replies.
My first, naive approach would be the following:
When a requests comes in:
Queue n messages for each of the n actions to be performed. Give them all the same correlation id.
Store n (i.e. the number of messages sent) in a database along with the correlation id of the messages.
Complete the request successfully
Each of the workers would do the following:
Receive a message from the queue
Do the work that needs to be done to handle the message
Decrement the counter stored in the database based on the correlation id.
If the counter has reached zero: Send a "COMPLETED" message to the completed-queue
However, I am wondering if there is an alternative solution which doesn't require a database (or any other kind of external store) to keep track whether all messages have already been processed or not.
Does JMS provide some functionality which would help me with this?
Or do I really have to use the database in this case?

If your system is distributed, and I presume it is, it's very hard to solve this problem without some kind of global latch lock like the one you have implemented. The main thing to notice is that "tasks" have to signal within "global storage" that they are over. Your app is essentially creating a new countdown latch lock instance (identified by CorrelationID) each time a new request comes by inserting a row in a db. Your tasks are "signaling" the end of jobs by counting that latch down. The job which ends holding a lock has to clean the row.
Now global storage doesn't have to be a database, but it still has to be some kind of global access state. And you have to keep on counting. And if only thing you have is a JMS you have to create latch and count down by sending messages.
The simplest solution which comes to a mind is by having each job sends a TASK_ENDED message to a JOBS_FINISHED queue. TASK_ENDED message stands for: "task X triggered by request Y with CorrelationID Z has ended" signal. Just as counting down in db. Recipient of this q is a special task whose only job is to trigger COMPLETED messages when all messages are received for a request with given correlation id. So this jobs is just reading messages sequentially. And counts each unique correlation id which it encounters. Once it has counted to an expected number it should clear that counter and send COMPLETED message.
You can encode number of triggered tasks and any other specifics within JMS header of messages created when processing request. For example:
// pretend this request handling triggers 10 tasks
// here we are creating first of ten START TASK messages
TextMessage msg1 = session.createTextMessage("Start a first task");
msg1.setJMSCorrelationID(request.id);
msg1.setIntProperty("TASK_NUM", 1);
msg1.setIntProperty("TOTAL_TASK_COUNT", 10);
And than you just pass that info to a TASK_ENDED messages all the way to a final job. You have to make sure that all messages sent to an ending job are received to same instance of a job.
You could go from here by expanding idea with publish subscribe messaging, and error handling and temporary queues and stuff like that, but that is becoming very specific of you needs so I'll end here.

Pulling just one message at a time

Im currently facing the problem that i want to realize a simple Master-Slave pattern, where the master initializes a job queue by publishing all jobs from the beginning to a topic. The slaves would pull those jobs everytime they have free working capabilities, pulling would be realized by pulling one job at a time. The code from the example code on github pulls multiple messages for a specific time
subscriber.startAsync().awaitRunning();
Thread.sleep(params.y());
I dont want that, i just want to pull one job message from the queue, let the slave do the work and after the work is done, call the pulling method to pull another job message, but just one at a time. Since I'm executing the jobs in an ExecutorService i want to ensure that i don't pull any messages, if my thread pool is filled. How would i realize pulling one message, fill that job into my ExecutorService and only pull the next job message, if there is a job finished, and a thread without work?

Pulling a single message at a time would be considered an anti-pattern for Google Cloud Pub/Sub. You can control the number of messages delivered to your worker by specifying FlowControlSettings via the Subscriber Builder. In particular, you could call setMaxOutstandingElementCount on the FlowControlSettings Builder to limit the maximum number of messages that have been delivered to the MessageReceiver you provided. If each of your workers is individually a subscriber and wants to perform a single action at a time, you could even set this number to 1.
If you need more exact control over the pull semantics for your subscriber, then you can use the gRPC library's pull method directly. The Serivce APIs Overview has more information on this approach.

How to restrict the akka actor to do one job at a time

I have an Java-Akka based application where one Akka actor tells another Akka actor to do a certain jobs and it starts doing the job in the command prompt but If I gave him 10 jobs it starts all the jobs at a time in 10 command prompt.
If i'll be having 100+ jobs than my system will be hanged.
So how can I make my application to do the job 1 at a time and all the other jobs should will get the CPU in FIFO(first in first out) manner.

The question is not quite clear but I try to answer with my understanding.
So, it looks like you use actor as a job dispatcher which translates job messages to calls for some "job executor system". Each incoming message is translated to some call.
If this call is synchronous (which smells when working with actors of course but just for understanding) then no problem in your case, your actor waits until call is complete, then proceed with next message in its mailbox.
If that call is asynchronous which I guess what you have then all the messages will be handled one by one without waiting for each other.
So you need to throttle the messages handling in order to have at most one message being processed at a time. This can be archived by "pull" pattern which is described here.
You basically allocate one master actor which has a queue with incoming messages (jobs) and one worker actor which asks for job when it is free of jobs. Be careful with the queue in master actor - you probably don't want it to grow too much, think about monitoring and applying back-pressure, which is another big topic covered by akka-stream.

rabbitMQ consume from 2 queues

I have to write heavy load system, with pretty easy task to do. So i decided to split this tasks into multiple workers in different locations (or clouds). To communicate i want to use rabbitmq queue.
In my system there will be two kinds of software nodes: schedulers and workers. Schedulers will take user input from queue_input, split it into smaller task and put this smaller task into workers_queue. Workers reads this queue and 'do the thing'. I used round-robbin load balancing here - and all works pretty well, as long, as some worker crashed. Then i loose information about task completion (it's not allowed to do single operation twice, each task contains a pack of 50 iterations of doing worker-code with diffirent data).
I consider something like technical_queue - another channel to scheduler-worker communication, and I wonder, how to design it in a good way. I used tutorials from rabbitmq page, so my worker thread looks like :
while(true) {
message = consume(QUEUE,...);
handle(message); //do 50 simple tasks in loop for data in message
}
How can i handle second queue? Another thread we some while(true) {} loop?, or is there a better sollution to this? Maybe should I reuse existing queue with topic exchange? (but i wanted to have independent way of communication, while handling the task, which may take some time.

You should probably take a look at spring-amqp (doc). I hate to tell you to add a layer but that spring library takes care of the threading issues and management of threads with its SimpleMessageListenerContainer. Each container goes to a queue and you can specify # of threads (ie workers) per queue.
Alternatively you can make your own using an ExecutorService but you will probably end up rewriting what SimpleMessageListenerContainer does. Also you just could execute (via OS or batch scripts) more processes and that will add more consumers to each queue.
As far as queue topology is concerned it is entirely dependent on business logic/concerns and generally less on performance needs. More often you had more queues for business reasons and more workers for performance reasons but if a queue gets backed up with the same type of message considering giving that type of message its own queue. What your describing sounds like two queues with multiple consumer on your worker queue.
Other than the threading issue and queue topology I'm not entirely sure what else you are asking.

I would recommend you create a second queue consumer
consumer1 -> queue_process
consumer2 -> queue_process
Both consumers should make listening to the same queue.
Greetings I hope will help

Akka actor that still works while waiting for new messages?

I'm new to Akka, and I apologize in advance if this is a basic question. I'm not sure how to use actors to implement the following scenario, or if it's even possible (or desirable).
I have a number of actors (i.e. producers) responsible for maintaining certain pieces of state concurrently, all of which notify another actor (i.e. consumer) when changes occur.
The consumer is to run a certain task repeatedly, a task that requires state from all producers just to start. It must also respond to changes in the state when it receives messages from the producers.
Before considering Akka, I'd kind of rolled my own simple actor model, with each actor running in its own thread. The run() methods would monitor an event queue, so I could have the consumer continually do something similar to this:
while not done
poll the event queue
if something was polled
process the event
if all state is available
do one step of the long running task
The continual polling of the event queue didn't sit well with me, but it at least made progress on the long running task between events.
Is there a best way to use Akka actors to implement this? I could implement a "heartbeat" which sends a message to the consumer (or sent internally to the consumer by itself) to do another step of the long running task, but I don't like the thought of having it be scheduled since the steps in the long running task aren't uniform in duration. I don't want to queue up iterations which keep it too busy so not to quickly respond to messages from the producer. But I also don't want to schedule it too infrequently, so it's sitting idle when it could be making progress...
Or would it be more appropriate to use a Dataflow model of concurrency for this (something I've only ready about)? The consumer can't start until the state is all bound, so it seems natural to define the process in terms of Dataflow variables. But if Dataflow variables can only be bound once, it doesn't seem appropriate for getting repeated updates in state from the consumers.

You can have the produces publish the changes to an Akka EventBus, and have the consumer register to listen for these events, then when it has all it needs, it can process the full chunk, or spawn a new actor that processes the full chunks.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.