Context:
I am designing an application which will be consuming messages from various Amazon SQS queues. (More than 25 queues)
For this, I am thinking of creating a library to consume messages from the queues, (call it MessageConsumer)
I want to be dynamically allocating threads to receive/process messages from different queues based on traffic in the queue to minimise waste of resources.
There are 2 ways I can go about it.
1) Can have only one type of thread that polls queues, receives messages and process those message and have one common thread pool for all queues.
2) Can have separate polling and worker threads.
In the second case, I will be having common worker thread pool and constant number of pollers per queue.
Edit:
To elaborate on the second case:
I am planning to have 1 continuously running thread per queue to poke that queue for the amount of messages in it. Then have some logic to decide the number of polling threads required per queue based on the number of messages in each queue and priority of the queue.
I dont want polling threads running all the time because that may cause empty receives (sqs.receiveMessages()), so I will allocate the polling threads based on traffic.
The high traffic queues will have more polling threads and hence more jobs being submitted to worker thread pool.
Please suggest any improvements or flaws in this design?
The recommended process is:
Workers poll the queue using Long Polling (which means it will wait for a maximum of 20 seconds before returning an empty response)
They can request up to 10 messages per call to ReceiveMessage()
The worker processes the message(s)
The worker deletes the message from the queue
Repeat
If you wish to scale the number of workers, you can base this on the ApproximateNumberOfMessagesVisible metric in Amazon CloudWatch. If the number goes too high, add a worker. If it drops to zero (or below some threshold), remove a worker.
It is probably easiest to have each worker only poll one queue.
There is no need for "pollers". The workers do the polling themselves. This way, you can scale the workers independently, without needing some central "polling" service trying to manage it all. Simply launch a new Amazon EC2 instance, launch the some workers and they start processing messages. When scaling-in, just terminate the workers or even the instance -- again, no need to register/deregister workers with a central "polling" service.
Related
There are two important fields for controlling the concurrency level in Java GCP PubSub consumer:
Parallel pull count
Number of executor threads
From the official example:
setParallelPullCount determines how many StreamingPull streams the subscriber will open to receive message. It defaults to 1. setExecutorProvider configures an executor for the subscriber to process messages. Here, the subscriber is configured to open 2 streams for receiving messages, each stream creates a new executor with 4 threads to help process the message callbacks. In total 2x4=8 threads are used for message processing.
So parallel pull count, if I'm not mistaken, directly refers to the number of Java executors (=thread pools), and number of executor threads sets the amount of threads per each pool.
Normally I reason about separate thread pools as having different use cases or responsibilities, so we might for example have one unbounded cached thread pool for IO, a fixed thread pool for CPU-bound ops, a single (or low number) threaded pool for async IO notifications, and so on.
But what would be the benefit of having two or more thread pools with identical properties for consuming and processing pubsub messages, compared to simply having a single thread pool with maximum desired number of threads? For example, if I can spare a total of 8 threads on the subscriber, what would be the concrete reason for using 1x8 vs 2x4 combination? (a single pool of 8 threads, versus pull count=2 using 4 threads each)?
The setParallelPullCount option doesn't just refer to the number of Java Executors, it refers to the number of streams created that request messages from the server. The different streams could potentially return a different number of messages due to a variety of factors. One may want to increase parallel pull count in order to process more messages in a single client than can be transmitted on a single stream (10MB/s). This is independent of the choice of whether or not to share executors/thread pools.
Whether or not to share a thread pool across the streams would be handled by calling setExecutorProvider. If you set an ExecutorProvider that returns the same Executor on each call to getExecutor, then the streams share it. If you have it return a new Executor for each call, then they each have their own dedicated Executor. The default ExecutorProvider does the latter.
If one calls setParallelPullCount(X), then setExecutor gets called X times to get an Executor for each stream. The choice between a shared one across all of them or individual ones for each probably doesn't change much the vast majority of the time. If you are trying to keep the number of overall threads relatively low, then sharing a single Executor may be helpful in doing that.
The choice between X Executors with Y threads and one Executor with X*Y threads really comes down to the capability to share such resources if the amount of data coming from each stream is vastly different, which probably isn't going to be the case most of the time. If it is, then a shared Executor means that a particularly saturated stream could "borrow" threads from an unsaturated one. On the other hand, using individual Executors could mean that in such a scenario, messages on the stream with fewer messages are as able to get through as messages on the saturated stream.
I want to use Pulsar as a message queue using shared consumers and the Java client. For the moment being, there are no strict ordering requirements, and also no partitions. The tasks triggered by the messages usually take up to 2 seconds. Is there any clear preference which of the following two methods of splitting the work between threads in a single application instance should be picked:
1 consumer with receive queue size 100 and 10 threads in a threadpool calling consumer.receive() in a loop.
10 consumers with receive queue size 10 each, using the MessageListener interface and running the task inside the original MessageListener.receive() call.
The best answer should be - just measure it :) Saying that, the first approach should be more efficient since no broker communication overhead is involved.
I am using GAE task queue to update bulk data in Datastore. Number of records are around 1-2M. To do this I scheduled a cron Job and a queue in this way
<queue>
<name>queueName</name>
<rate>20/s</rate>
<bucket-size>300</bucket-size>
<retry-parameters>
<task-retry-limit>1</task-retry-limit>
</retry-parameters>
<max-concurrent-requests>800</max-concurrent-requests>
</queue>
Each task is doing following task
Fetching 1500 record from datastore using a cursor.
If the next cursor exists create a new task and push in the queue.
Process 1500 fetched record, means updating all 1500 in datastore back.
the expected task to add should be around 667, but I can only see 40 tasks in logs.
In logs, I can see the 40 tasks are added in the queue in 40 sec. I m not getting any error in the logs.
Can anybody help me to understand what is happening? Why I m not able to add all the task.
Thanks
In your approach the task enqueueing appears to be very tightly coupled with the task request processing, in the sense that the request for one such task in the queue needs to be processed to enqueue the next task. So you need to take a look at your task processing rate limiting factors you may hit. The ones from your queue configuration are pretty generous, but there are others.
If you configured your app with threadsafe and if your app design takes advantage of it an instance of your app will be able to handle multiple requests concurrently, up to a maximum depending on its max-concurrent-requests config and its processing latency. Without the threadsafe config that maximum is 1.
Once an instance hits the max number of task requests it can process concurrently it won't start processing new tasks from the queue (so it won't execute step #1 - enqueueing a new task) until it completes processing at least one of the tasks already in progress. The task enqueueing rate per app instance is thus effectively limited - each running instance can contribute to the overall number of tasks in the queue only with a number equal to the max number of tasks it can process in parallel.
But your app is configured for automatic scaling, so once you manage to quickly "fill up" all your running instances, the scheduler will start new instances for it. As new instances are started they will be able to process more of the tasks in the queue and thus also enqueue new tasks, contributing with the above-mentioned amount to the total number of tasks in the queue.
But this growth in the number of enqueued tasks can be much slower than while instances didn't hit their max processing rate - it takes some time to measure how new instances helps with traffic to determine if more instances are needed or not. The overall growth in the number of tasks in the queue will have a "staircase" profile, with the height of a step being the max number of concurrent requests an instance can handle and the number of steps being the number of new instances started +1.
Since you aren't seeing any actual task enqueuing errors I can only suspect that you're somehow hitting a rate limit in processing your enqueued tasks or somehow that processing completely stops. There can be many reasons for it, including, for example:
hitting your app's daily budget (most likely due to the number of instance-hours)
hitting automatic scaling limits
You'd have to investigate your app from this perspective to pinpoint the culprit.
Side note: I assume this is on GAE, not on the development server (which doesn't respect the task queue configs and most likely can't get even close to GAE's parallel processing capability).
I have 1 thread putting Requests to Queue and Another Cron Job (thread) would run every 15 minutes and has to take all requests from queue and start processing on it and also empty the queue.
How can I manage this synchronization and make sure no requests are lost in system.
I have thought of using Linked Queue for the same.
Other suggestion are welcome.
I am new to Java so asking this naive question.
In java.util.concurrent package you have a whole bunch of queues to your disposal, however, I don't believe that there's one particular queue just for the scenario you described above.
I would recommend just pick one of the Blocking queues, and in parallel just run a job that every 15 minutes will drain all items in your queue.
I'd like a quick confirmation of what I suspect this part of the RabbitMQ documentation says:
Callbacks to Consumers are dispatched on a thread separate from the thread managed by the Connection. This means that Consumers can safely call blocking methods on the Connection or Channel, such as queueDeclare, txCommit, basicCancel or basicPublish.
Each Channel has its own dispatch thread. For the most common use case of one Consumer per Channel, this means Consumers do not hold up other Consumers. If you have multiple Consumers per Channel be aware that a long-running Consumer may hold up dispatch of callbacks to other Consumers on that Channel.
I have various commands (messages) coming in through a single inbound queue and channel which has a DefaultConsumer attached to it. Is it correct to assume that there is a threadpool in DefaultConsumer that lets me run application logic straight off the consumer callback method, and I'm not blocking the processing of later commands? And that if it seems like there's a bottleneck, I can just give RMQ a bigger threadpool?
In addition, occasionally there is a basicPublish to the same channel from other threads. I take it that this does hold up the consumers? I guess I should make use of a new channel when doing this?
The thread pool you mentioned is not a part of DefaultConsumer but rather a part of Connection that is shared between its Channels and DefaultConsumers. It allows different consumers be invoked in parallel. See this part of the guide.
So you would expect that by increasing size of the thread pool you can reach higher level of parallelism. However that's not the only factor that influences it.
There's a big caveat: incoming messages flowing though a single channel are processed serially no matter how many threads you have in the thread pool. It's just the way how ConsumerWorkService is implemented.
So to be able to consume incoming messages concurrently you have either to manage multiple channels or to put those messages into a separate thread pool.
Publishes do not use threads from the Connections's thread pool so they do not hold up consumers.
For more details you may check this post.