There are two important fields for controlling the concurrency level in Java GCP PubSub consumer:
Parallel pull count
Number of executor threads
From the official example:
setParallelPullCount determines how many StreamingPull streams the subscriber will open to receive message. It defaults to 1. setExecutorProvider configures an executor for the subscriber to process messages. Here, the subscriber is configured to open 2 streams for receiving messages, each stream creates a new executor with 4 threads to help process the message callbacks. In total 2x4=8 threads are used for message processing.
So parallel pull count, if I'm not mistaken, directly refers to the number of Java executors (=thread pools), and number of executor threads sets the amount of threads per each pool.
Normally I reason about separate thread pools as having different use cases or responsibilities, so we might for example have one unbounded cached thread pool for IO, a fixed thread pool for CPU-bound ops, a single (or low number) threaded pool for async IO notifications, and so on.
But what would be the benefit of having two or more thread pools with identical properties for consuming and processing pubsub messages, compared to simply having a single thread pool with maximum desired number of threads? For example, if I can spare a total of 8 threads on the subscriber, what would be the concrete reason for using 1x8 vs 2x4 combination? (a single pool of 8 threads, versus pull count=2 using 4 threads each)?
The setParallelPullCount option doesn't just refer to the number of Java Executors, it refers to the number of streams created that request messages from the server. The different streams could potentially return a different number of messages due to a variety of factors. One may want to increase parallel pull count in order to process more messages in a single client than can be transmitted on a single stream (10MB/s). This is independent of the choice of whether or not to share executors/thread pools.
Whether or not to share a thread pool across the streams would be handled by calling setExecutorProvider. If you set an ExecutorProvider that returns the same Executor on each call to getExecutor, then the streams share it. If you have it return a new Executor for each call, then they each have their own dedicated Executor. The default ExecutorProvider does the latter.
If one calls setParallelPullCount(X), then setExecutor gets called X times to get an Executor for each stream. The choice between a shared one across all of them or individual ones for each probably doesn't change much the vast majority of the time. If you are trying to keep the number of overall threads relatively low, then sharing a single Executor may be helpful in doing that.
The choice between X Executors with Y threads and one Executor with X*Y threads really comes down to the capability to share such resources if the amount of data coming from each stream is vastly different, which probably isn't going to be the case most of the time. If it is, then a shared Executor means that a particularly saturated stream could "borrow" threads from an unsaturated one. On the other hand, using individual Executors could mean that in such a scenario, messages on the stream with fewer messages are as able to get through as messages on the saturated stream.
Related
I want to use Pulsar as a message queue using shared consumers and the Java client. For the moment being, there are no strict ordering requirements, and also no partitions. The tasks triggered by the messages usually take up to 2 seconds. Is there any clear preference which of the following two methods of splitting the work between threads in a single application instance should be picked:
1 consumer with receive queue size 100 and 10 threads in a threadpool calling consumer.receive() in a loop.
10 consumers with receive queue size 10 each, using the MessageListener interface and running the task inside the original MessageListener.receive() call.
The best answer should be - just measure it :) Saying that, the first approach should be more efficient since no broker communication overhead is involved.
Context:
I am designing an application which will be consuming messages from various Amazon SQS queues. (More than 25 queues)
For this, I am thinking of creating a library to consume messages from the queues, (call it MessageConsumer)
I want to be dynamically allocating threads to receive/process messages from different queues based on traffic in the queue to minimise waste of resources.
There are 2 ways I can go about it.
1) Can have only one type of thread that polls queues, receives messages and process those message and have one common thread pool for all queues.
2) Can have separate polling and worker threads.
In the second case, I will be having common worker thread pool and constant number of pollers per queue.
Edit:
To elaborate on the second case:
I am planning to have 1 continuously running thread per queue to poke that queue for the amount of messages in it. Then have some logic to decide the number of polling threads required per queue based on the number of messages in each queue and priority of the queue.
I dont want polling threads running all the time because that may cause empty receives (sqs.receiveMessages()), so I will allocate the polling threads based on traffic.
The high traffic queues will have more polling threads and hence more jobs being submitted to worker thread pool.
Please suggest any improvements or flaws in this design?
The recommended process is:
Workers poll the queue using Long Polling (which means it will wait for a maximum of 20 seconds before returning an empty response)
They can request up to 10 messages per call to ReceiveMessage()
The worker processes the message(s)
The worker deletes the message from the queue
Repeat
If you wish to scale the number of workers, you can base this on the ApproximateNumberOfMessagesVisible metric in Amazon CloudWatch. If the number goes too high, add a worker. If it drops to zero (or below some threshold), remove a worker.
It is probably easiest to have each worker only poll one queue.
There is no need for "pollers". The workers do the polling themselves. This way, you can scale the workers independently, without needing some central "polling" service trying to manage it all. Simply launch a new Amazon EC2 instance, launch the some workers and they start processing messages. When scaling-in, just terminate the workers or even the instance -- again, no need to register/deregister workers with a central "polling" service.
According to this link:
server.tomcat.max-threads – Maximum amount of worker threads in server
under top load. In other words, maximum number of simultaneous
requests that can be handled.
Let's say that every request that tomcat server gets spawns 3 worker threads for reading from database.
It means that for 5 requests we have 20 threads (each request has 1 thread and additional 3 worker threads).
In this case, do we consider number of threads as 20 or 5 for dealing with the property server.tomcat.max-threads?
The way to limit the number of threads is to not spawn them directly.
Instead use a thread pool with a fixed upper bound on the number of threads.
The modern way to do this is to use the ExecutorService API (javadoc) and instantiate the service using either Executors.newFixedThreadPool(...) (javadoc) or directly using one of the many ThreadPoolExecutor (javadoc) constructor overloads.
In this case, do we consider number of threads as 20 or 5 for dealing with the property server.tomcat.max-threads?
Threads that are created by an application or an application thread pool while processing a request do not count as "worker threads" for the purposes of that Tomcat configuration property.
It is up to the application or its thread pool to manage any threads it creates and ensure that:
the number of these threads does not get too large,
they don't consume too much resources (CPU, memory, etc), and
they don't get "orphaned" and end up wasting resources on a task that is no longer needed; e.g. because the original client request timed out.
Beware that this kind of thing can easily turn into "denial of service" problem.
I'd like a quick confirmation of what I suspect this part of the RabbitMQ documentation says:
Callbacks to Consumers are dispatched on a thread separate from the thread managed by the Connection. This means that Consumers can safely call blocking methods on the Connection or Channel, such as queueDeclare, txCommit, basicCancel or basicPublish.
Each Channel has its own dispatch thread. For the most common use case of one Consumer per Channel, this means Consumers do not hold up other Consumers. If you have multiple Consumers per Channel be aware that a long-running Consumer may hold up dispatch of callbacks to other Consumers on that Channel.
I have various commands (messages) coming in through a single inbound queue and channel which has a DefaultConsumer attached to it. Is it correct to assume that there is a threadpool in DefaultConsumer that lets me run application logic straight off the consumer callback method, and I'm not blocking the processing of later commands? And that if it seems like there's a bottleneck, I can just give RMQ a bigger threadpool?
In addition, occasionally there is a basicPublish to the same channel from other threads. I take it that this does hold up the consumers? I guess I should make use of a new channel when doing this?
The thread pool you mentioned is not a part of DefaultConsumer but rather a part of Connection that is shared between its Channels and DefaultConsumers. It allows different consumers be invoked in parallel. See this part of the guide.
So you would expect that by increasing size of the thread pool you can reach higher level of parallelism. However that's not the only factor that influences it.
There's a big caveat: incoming messages flowing though a single channel are processed serially no matter how many threads you have in the thread pool. It's just the way how ConsumerWorkService is implemented.
So to be able to consume incoming messages concurrently you have either to manage multiple channels or to put those messages into a separate thread pool.
Publishes do not use threads from the Connections's thread pool so they do not hold up consumers.
For more details you may check this post.
I have object instances of a custom class, and each instance processes messages (via methods) coming through independently for each instance. No instances "talk" to other instances.
My question is, is putting each object in its own thread necessary since each object processes independently real-time messages (logs etc...) coming through anyhow?
Thanks for any responses.
My question is, is putting each object in its own thread necessary
since each object processes independently real-time messages (logs
etc...) coming through anyhow?
You need to process each of the message acquired by each object in new separate thread. This will lead to fast processing of the incoming messages for your object. And since , there is not interaction between each object so no thread synchronization is needed which is good for your application. Or, better that you use pool of threads. Have a look at ThreadPoolExecutor
It is not necessary for each object to have its own thread, however, you may gain improved performance by having more than one message processing thread. The ideal number of threads is not necessarily (or even likely) to be the same as the number of processing objects.
Typically, in a situation like you describe the approach would be to use a task / message processing queue where each object you have adds tasks to the queue, and then multiple threads process items from the queue in order. The number of threads used here is configurable so that the application can be optimized for the platform it is running on.
An easy way to achieve this design is to simply use an ExecutorService as your task queue (in which case your messages themselves must implement Runnable):
// For 2 threads, adjust as appropriate.
ExecutorService executor = Executors.newCachedThreadPool(2);
And then to add a Runnable message:
// Add a message to the queue for concurrent / asynchronous processing
executor.submit(message);
Note that the executor itself should be shared across all of your message handling objects, so that each object is adding messages to the same queue (assuming you have many message handling objects). It is also possible to have a queue per message handling object, but that decision would depend on the number of handling objects and any requirements surrounding how messages are processed.