Make threads specific to a message channel in spring cloud GCP pubsub - java

I have a spring cloud application running GCP PubSub messaging. I've got 2 message inbound channels that is subscribed to 2 different subscribers. The problem I face during load/stress test of the application is that, with a specific no.of threads set as below :
spring.cloud.gcp.pubsub.subscriber.executor-threads: 350
spring.cloud.gcp.pubsub.subscriber.parallel-pull-count: 2
spring.cloud.gcp.pubsub.subscriber.max-acknowledgement-threads: 700
when the processes pulled by messages of channel 1 are busy, I don't have sufficient threads for channel 2 to pull messages. The solution would be is to restrict/configure no.of threads for each channel. I am finding a very hard time to figure this out. Please do help me out here ! Below are the channels I was referring to :
#Bean
#ServiceActivator(inputChannel = "pubsubInputChannel1")
public MessageHandler extractionMessageReceiver() {
return message -> {
// do something
};
}
#Bean
#ServiceActivator(inputChannel = "pubsubInputChannel2")
public MessageHandler extractionMessageReceiver() {
return message -> {
// do something
};
}
Note, the subscriber thread remains busy until the end of a particular process pulled by a message.

I had the following problem: when there were a lot of messages and they queued up, the actuator health check for pubsub stopped working. My assumption was that all the executor threads were busy handling the messages and the check run into a deadline exceeded exception.
The following flow control properties helped me to fix the issue:
Config
Description
spring.cloud.gcp.pubsub.[subscriber,publisher.batching].flow-control.max-outstanding-element-count
Maximum number of outstanding elements to keep in memory before enforcing flow control
spring.cloud.gcp.pubsub.[subscriber,publisher.batching].flow-control.max-outstanding-request-bytes
Maximum number of outstanding bytes to keep in memory before enforcing flow control.
From https://docs.spring.io/spring-cloud-gcp/docs/1.1.0.M1/reference/html/_spring_cloud_gcp_for_pub_sub.html
When I set the max outstanding element count to 100 everything worked fine.
I think the outstanding messages get pulled with a stream. And with the properties above we can control that not all messages get processed at once. Instead, we split them up into, for example, 100 messages each. Maybe it will also switch between the channels. Quote from https://medium.com/google-cloud/things-i-wish-i-knew-about-google-cloud-pub-sub-part-2-b037f1f08318:
Note, streaming pull only guarantees flow control on a best-effort basis. Say you’ve noted your application can only handle 100 messages in any one period, so you set max outstanding messages to 100. The client will pause once it has pulled in 100 messages, which works most of the time. However, if you then publish 500 messages in a single publish batch, the client will receive all 500 messages at once but only be able to process 100 at a time, potentially leading to a growing backlog of expired messages. This is because streaming pull can’t split up messages from a single publish batch. To avoid this, either increase your number of subscribers or decrease your batch sizes to match subscriber message processing capacity while publishing.
Could these parameters maybe solve your problem?

Related

How to scale kafka message consumption when consumers create bottleneck (high message processing time)?

We have a kafka sdk written over apache-kafka (2.7.0) that we use to produce and consume messages to kafka topics.
By default the configuration is like this -
Auto commit is set to false
We use commitSync() for offsets
poll frequency for consumers is 1000 ms
max.poll.records is set to 2
Consumers are single threaded and single consumer runs per instance/pod (we use EKS)
Now, there is a order service that produces order-created message to order topic and it is consumed by another service that fulfils the order fulfil service. The fulfilment logic takes on an average 20s to process this message (too high!).
Because of this even if we have 10 partitions in the topic and 10 application pods / consumers running (they all belong to same consumer group), we can only process 3 messages per minute per consumer (30 messages per minute overall).
The problem in rate of message production at peak is around 300 per minute. Even if we scale to 50 partitions with 50 consumers, we can only process 150 per minute. And even here, each consumer remains underutilized in terms of cpu and memory usage.
Because of this, over time, there is a huge build up in consumer lag.
How do we scale to solve this problem? We can't have 100s of underutilized consumers running as that is not cost effective. Please help with any pointers to solve this.
PS. : We are looking into how to optimize the consumer that is taking 20s on average, but it will take time and we need a short term solution for this that is cost effective as well.
I'd rather suggest "semi-lambda" architecture approach. If you are already running on k8s, use openfaas/knative to decouple handling of these messages:
First service that consumes messages, verifies them and spins up lambdas to handle those.
Actually lambdas, managed by openfaas or etc, this is classic use case when handling of message is higher than 500ms and not more than couple minutes. When this lambda finishes handling it will return response to first service. If it's okay - commit the offset, if not - also commit, but re-send to the dead queue.

Unexpected backlog size in Pulsar

I'm using Pulsar for communication between services and I'm experiencing flakiness in a quite simple test of producers and consumers.
In JUnit 4 test, I spin up (my own wrappers around) a ZooKeeper server, a BookKeeper bookie, and a PulsarService; the configurations should be quite standard.
The test can be summarized in the following steps:
build a producer;
build a consumer (say, a reader of a Pulsar topic);
check the message backlog (using precise backlog);
this is done by getting the current subscription via PulsarAdmin#topics#getStats#subscriptions
I expect it to be 0, as nothing was sent on the topic, but sometimes it is 1, but this seems another problem...
build a new producer and synchronously send a message onto the topic;
build a new consumer and read the messages on the topic;
I expect a backlog of one message, and I actually read one
build a new producer and synchronously send four messages;
fetch again the messages, using the messageID read at step 5 as start message ID;
I expect a backlog of four messages here, and most of the time this value is correct, but running the test about ten times I consistently get 2 or 5
I tried debugging the test, but I cannot figure out where those values come from; did I misunderstand something?
Things you can try if not already done:
Ask for precise backlog measurement. By default, it's only estimated as getting the precise measurement is a costlier operation. Use admin.topics().getStats(topic, true) for this. (See https://github.com/apache/pulsar/blob/724523f3051def9577d6bd27697866c99f4a7b0e/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L862)
Deactivate batching on the producer side. The number returned in msgBacklog is the number of entries so multiple messages batched in a single entry will count as 1. See relevant issue : https://github.com/apache/pulsar/issues/7623. It can explain why you see a value of 2 for the msgBacklog if the 4 messages have been put in the same batch. Beware that deactivating batching can have a huge impact on performance.

How to control the number of messages that being emitted by Apache Kafka per a specific time?

I am new to Apache Kafka and I am trying to configure Apache Kafka that it receives messages from the producer as much as possible but it only sends to the consumer configured number of messages per specific time.
In other words How to configure Apache Kafka to send only "50 messages for example" per "30 seconds"
to the consumer regardless of the number of the messages, and in the next 30 seconds it takes another 50 messages from the cashed messages and so on.
If you have control over the consumer
You could use max.poll.records property to limit max number of records per poll() method call. And then you only need to ensure that poll() is called once in 30 seconds.
In general you can take a look at all available configuration properties here.
If you cannot control consumer
Then the only option for you is to write messages as per your demand - write at most 50 messages in 30 seconds. There are no configuration options available. Only your application logic can achieve that.
updated - how to control ensure call to poll
The simplest way is to:
while (true) {
consumer.poll()
// .. do your stuff
Thread.sleep(30000);
}
You can make things more complex with measuring time for processing (i.e. starting after poll call up to Thread.sleep() to not wait more then 30 seconds at all.
The problem that producer really doesn't send messages to the consumer. There is that persistent Kafka topic in between where producer places its messages. And it really doesn't care if there is any consumer on the other side. Same from the consumer perspective: it just subscribers for data from the topic and doesn't care if there is some producer on the other side. So, thinking about a back-pressure from the consumer down to producer where there is a messaging middle ware is wrong direction.
On the other hand it is not clear how those consumed messages may impact your third party service. The point is that Kafka consumer is single-threaded per partition. So, all the messages from one partition is going to be (must) processed one by one in the same thread. This way you cannot send more than one messages to your service: the next one can be sent only when the previous has been replied. So, think about it: how it is even possible in your consumer application to excess rate limit?
However if you have enough partitions and high concurrency on the consumer side, so you really may end up with several requests to your service in parallel from different threads. For this purpose I would suggest to take a look into a Rate Limiter pattern. This library provides a good implementation: https://resilience4j.readme.io/docs/ratelimiter. It is much better to keep messages in the topic then try to limit producer somehow.
To conclude: even if the consumer side is not your project, it is better to discuss with that team how to improve their consumer. You did your part well: the producer sends messages to Kafka topic. What else you can do over here?
Interesting use case and not sure why you need it, but two possible solutions: 1. To protect the cluster, you could use quotas, not for amount of messages but for bandwidth throughput: https://kafka.apache.org/documentation/#design_quotas . 2. If you need an exact amount of messages per time frame, you could put a buffering service (rate limiter) in between where you consume and pause, publishing messages to the consumed topic. Rate limiter could consume next 50 then pause until minute passes. This will increase space used on your cluster because of duplicated messages. You also need to be careful of how to pause the consumer, hearbeats need to be sent else you will rebalance your consumer continuously, ie you can't just sleep till next minute. This is obviously if you can't control the end consumer.

Activemq does not balance messages after some time

I´m using activemq(5.14.5) with camel(2.13.4) because I still need java 6.
I have a queue and 15 consumers. The messages sent to them are request reply.
When I start the consumers, the messages are distributed one per consumer as soon as the messages arrive but, after some time, only one consumer receives the messages, the others stay idle and a lot of messages stay pending.
The consumers have this configuration:
concurrentConsumers=15&maxMessagesPerTask=1&destination.consumer.prefetchSize=0&transferException=true
The time spent to process each message can varies a lot because of our business rule so, I don´t know if activemq has some rule that manage slow consumers and redirect to only one that is more "efficient".
The behaviour that I was expecting is that all the messages that arrives, start to process until all the consumers are full, but it is not what is happening.
Anybody knows what is happening?
Following is an image about what is happening:
Your configuration has two eye-catching settings:
maxMessagesPerTask=1
If you did not intend to configure auto-scaling the threadpool, you should remove this setting completely. Is is by default unlimited and it sets how long to keep threads for processing (scaling up/down threadpool).
See also the Spring Docs about this setting
prefetchSize=0
Have you tried setting this to 1 so that every consumer just gets 1 message at a time?
The AMQ docs say about the prefetchSize:
Large prefetch values are recommended for high performance with high message volumes. However, for lower message volumes, where each message takes a long time to process, the prefetch should be set to 1. This ensures that a consumer is only processing one message at a time. Specifying a prefetch limit of zero, however, will cause the consumer to poll for messages, one at a time, instead of the message being pushed to the consumer.

What is the correct way to throttle ActiveMQ producers who send persistent messages in batches to a queue?

I have a producer which sends persistent messages in batches to a queue leveraging JMS transaction.
I have tested and found that Producer Flow Control is applied when using a batch size of 1. I could see my producer being throttled as per the memory limit I have configured for the queue. Here's my Producer Flow Control configuration:
<policyEntry queue="foo" optimizedDispatch="true"
producerFlowControl="true" memoryLimit="1mb">
</policyEntry>
The number of pending messages in the queue are in control which I see as the evidence for Producer Flow Control in action.
However, when the batch size is increased to 2, I found that this memory limit is not respected and the producer is NOT THROTTLED at all. The evidence being the number of pending messages in the queue continue to increase till it hits the storeUsage limit configured.
I understand this might be because the messages are sent in asynchronous fashion when the batch size is more than 1 even though I haven't explicitly set useAsyncSend to true.
ActiveMQ's Producer Flow Control documentation mentions that to throttle asynchronous publishers, we need to configure Producer Window Size in the producer which shall force the Producer to wait for acknowledgement once the window limit is reached.
However, when I configured Producer Window Size in my producer and attempted to send messages in batches, an exception is thrown and no messages were sent.
This makes me think and ask this question, "Is it possible to configure Producer Window Size while sending persistent messages in batches?".
If not, then what is the correct way to throttle the producers who send persistent messages in batches?
There is not really a way to throttle "max msgs per second" or similar. What you would do is to enable producer flow control and vm cursor, then set the memory limit on that queue (or possibly all queues if you wish) to some reasonable level.
You can decide in the configuration if the producer should hang or throw an exception if the queue memory limit has been reached.
<policyEntry queue="MY.BATCH.QUEUE" memoryLimit="100mb" producerFlowControl="true">
<pendingQueuePolicy>
<vmQueueCursor/>
</pendingQueuePolicy>
</policyEntry>
I found this problem in v5.8.0 but found this to be resolved in v5.9.0 and above.
From v5.9.0 onwards I found PFC is applied out of the box even for producers who send messages asynchronously.
Since batch send (where batch size > 1) is essentially an asynchronous operation, this applies there as well.
But the PFC wiki was confusing as it mentions that one should configure ProducerWindowSize for async producers if PFC were to be applied. However, I tested and verified that this was not needed.
I basically configured a per-destination limit of 1mb and sent messages in batches (with batch size of 100).
My producer was throttled out of the box without any additional configuration. The number of pending messages in the queue didn't increase and was under control.
With a simple Camel consumer consuming the messages (and appending them to a file), I found that with v5.8.0 (where I faced the problem), I could send 100k messages with the payload being 2k in 36 seconds. But most of them ended up as Pending messages.
But with v5.9.0, it took 176 seconds to send the same set of messages testifying the role played by PFC. And the number of pending messages never increased beyond 1000 in my case.
I also tested with v5.10.0 and v5.12.0 (the latest version at the time of writing) which worked as expected.
So if you are facing this problem, chances are that you are running ActiveMQ v5.8.0 or earlier. Simply upgrading to the latest version should solve this problem.
I thank the immensely helpful ActiveMQ mailing list folks for all their suggestions and help.
Thanks #Petter for your answer too. Sorry I didn't mention the version I was using in my question, otherwise I believe you could have spotted the problem straight away.

Categories