I'm using Pulsar for communication between services and I'm experiencing flakiness in a quite simple test of producers and consumers.
In JUnit 4 test, I spin up (my own wrappers around) a ZooKeeper server, a BookKeeper bookie, and a PulsarService; the configurations should be quite standard.
The test can be summarized in the following steps:
build a producer;
build a consumer (say, a reader of a Pulsar topic);
check the message backlog (using precise backlog);
this is done by getting the current subscription via PulsarAdmin#topics#getStats#subscriptions
I expect it to be 0, as nothing was sent on the topic, but sometimes it is 1, but this seems another problem...
build a new producer and synchronously send a message onto the topic;
build a new consumer and read the messages on the topic;
I expect a backlog of one message, and I actually read one
build a new producer and synchronously send four messages;
fetch again the messages, using the messageID read at step 5 as start message ID;
I expect a backlog of four messages here, and most of the time this value is correct, but running the test about ten times I consistently get 2 or 5
I tried debugging the test, but I cannot figure out where those values come from; did I misunderstand something?
Things you can try if not already done:
Ask for precise backlog measurement. By default, it's only estimated as getting the precise measurement is a costlier operation. Use admin.topics().getStats(topic, true) for this. (See https://github.com/apache/pulsar/blob/724523f3051def9577d6bd27697866c99f4a7b0e/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L862)
Deactivate batching on the producer side. The number returned in msgBacklog is the number of entries so multiple messages batched in a single entry will count as 1. See relevant issue : https://github.com/apache/pulsar/issues/7623. It can explain why you see a value of 2 for the msgBacklog if the 4 messages have been put in the same batch. Beware that deactivating batching can have a huge impact on performance.
Related
I am processing messages from Kafka in a standard processing loop:
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
processMessage(record);
}
}
What should I do if my Kafka Consumer gets into a timeout while processing the records? I mean the timeout controlled by the property session.timeout.ms
When this happens, my consumer should stop processing the records, because it would lose its partitions and the records that it processes could be already processed by another consumer. If the original consumer writes some processing results into a database, it could overwrite the records produced by the "new" consumer that got the partitions after my original consumer timed out.
I know about the ConsumerRebalanceListener, but from my understanding its method onPartitionsLost would only be called after I call the poll method from the consumer. Therefore this doesn't help me to stop the processing loop of the batch of records that I received from the previous poll.
I would expect that the heartbeat thread could notify me that it was not able to contact the broker and that we have a session timeout in the consumer, but there doesn't seem to be anything like that...
Am I missing something?
Adding this as an answer as it would be too long in a comment.
Kafka has a few ways that can be used to process messages
At most once;
At least once; and
Exactly once.
You are describing that you would like to use kafka as exactly once semantics (which by the way is the least common way of using kafka). Also producers need to play nicely as by default kafka can produce the same message more than once.
It's a lot more common to build services that use the at least once mechanism, in this way you can receive (or process) the same message more than once but you need to have a way to deduplicate them (it's the same idea behind idempotency on http APIs). You'll need to have something in the message that is unique and have register that that id has been processed already. If the payload has nothing you can use to deduplicate them, you can add a header on the message and use that.
This is also useful in the scenario that you have to reset the offset, so the service can go through old messages without breaking.
I would suggest you to google a bit for details on how to implement the above.
Here's a blog post from confluent about developing exactly once semantics Improved Robustness and Usability of Exactly-Once Semantics in Apache Kafka and the Kafka docs explaining the different semantics.
About the point of the ConsumerRebalanceListener, you don't need to do anything if you follow the solution of using idempotency in the consumer. Rebalances also happen when an app crashes, and in that scenario the service might have processed some records, but not committed them yet to Kafka.
A mini tip I give to everyone who is starting with Kafka. Kafka looks simple from the outside but it's a complex technology. Don't use it in production until you know the nitty gritty details of how it works including have done some good amount of negative testing (unless you are ok with losing data).
I have a spring cloud application running GCP PubSub messaging. I've got 2 message inbound channels that is subscribed to 2 different subscribers. The problem I face during load/stress test of the application is that, with a specific no.of threads set as below :
spring.cloud.gcp.pubsub.subscriber.executor-threads: 350
spring.cloud.gcp.pubsub.subscriber.parallel-pull-count: 2
spring.cloud.gcp.pubsub.subscriber.max-acknowledgement-threads: 700
when the processes pulled by messages of channel 1 are busy, I don't have sufficient threads for channel 2 to pull messages. The solution would be is to restrict/configure no.of threads for each channel. I am finding a very hard time to figure this out. Please do help me out here ! Below are the channels I was referring to :
#Bean
#ServiceActivator(inputChannel = "pubsubInputChannel1")
public MessageHandler extractionMessageReceiver() {
return message -> {
// do something
};
}
#Bean
#ServiceActivator(inputChannel = "pubsubInputChannel2")
public MessageHandler extractionMessageReceiver() {
return message -> {
// do something
};
}
Note, the subscriber thread remains busy until the end of a particular process pulled by a message.
I had the following problem: when there were a lot of messages and they queued up, the actuator health check for pubsub stopped working. My assumption was that all the executor threads were busy handling the messages and the check run into a deadline exceeded exception.
The following flow control properties helped me to fix the issue:
Config
Description
spring.cloud.gcp.pubsub.[subscriber,publisher.batching].flow-control.max-outstanding-element-count
Maximum number of outstanding elements to keep in memory before enforcing flow control
spring.cloud.gcp.pubsub.[subscriber,publisher.batching].flow-control.max-outstanding-request-bytes
Maximum number of outstanding bytes to keep in memory before enforcing flow control.
From https://docs.spring.io/spring-cloud-gcp/docs/1.1.0.M1/reference/html/_spring_cloud_gcp_for_pub_sub.html
When I set the max outstanding element count to 100 everything worked fine.
I think the outstanding messages get pulled with a stream. And with the properties above we can control that not all messages get processed at once. Instead, we split them up into, for example, 100 messages each. Maybe it will also switch between the channels. Quote from https://medium.com/google-cloud/things-i-wish-i-knew-about-google-cloud-pub-sub-part-2-b037f1f08318:
Note, streaming pull only guarantees flow control on a best-effort basis. Say you’ve noted your application can only handle 100 messages in any one period, so you set max outstanding messages to 100. The client will pause once it has pulled in 100 messages, which works most of the time. However, if you then publish 500 messages in a single publish batch, the client will receive all 500 messages at once but only be able to process 100 at a time, potentially leading to a growing backlog of expired messages. This is because streaming pull can’t split up messages from a single publish batch. To avoid this, either increase your number of subscribers or decrease your batch sizes to match subscriber message processing capacity while publishing.
Could these parameters maybe solve your problem?
I´m using activemq(5.14.5) with camel(2.13.4) because I still need java 6.
I have a queue and 15 consumers. The messages sent to them are request reply.
When I start the consumers, the messages are distributed one per consumer as soon as the messages arrive but, after some time, only one consumer receives the messages, the others stay idle and a lot of messages stay pending.
The consumers have this configuration:
concurrentConsumers=15&maxMessagesPerTask=1&destination.consumer.prefetchSize=0&transferException=true
The time spent to process each message can varies a lot because of our business rule so, I don´t know if activemq has some rule that manage slow consumers and redirect to only one that is more "efficient".
The behaviour that I was expecting is that all the messages that arrives, start to process until all the consumers are full, but it is not what is happening.
Anybody knows what is happening?
Following is an image about what is happening:
Your configuration has two eye-catching settings:
maxMessagesPerTask=1
If you did not intend to configure auto-scaling the threadpool, you should remove this setting completely. Is is by default unlimited and it sets how long to keep threads for processing (scaling up/down threadpool).
See also the Spring Docs about this setting
prefetchSize=0
Have you tried setting this to 1 so that every consumer just gets 1 message at a time?
The AMQ docs say about the prefetchSize:
Large prefetch values are recommended for high performance with high message volumes. However, for lower message volumes, where each message takes a long time to process, the prefetch should be set to 1. This ensures that a consumer is only processing one message at a time. Specifying a prefetch limit of zero, however, will cause the consumer to poll for messages, one at a time, instead of the message being pushed to the consumer.
I have below configuration for rabbitmq
prefetchCount:1
ack-mode:auto.
I have one exchange and one queue is attached to that exchange and one consumer is attached to that queue. As per my understanding below steps will be happening if queue has multiple messages.
Queue write data on a channel.
As ack-mode is auto,as soon as queue writes message on channel,message is removed from queue.
Message comes to consumer,consumer start performing on that data.
As Queue has got acknowledgement for previous message.Queue writes next data on Channel.
Now,my doubt is,Suppose consumer is not finished with previous data yet.What will happen with that next data queue has written in channel?
Also,suppose prefetchCount is 10 and I have just once consumer attached to queue,where these 10 messages will reside?
The scenario you have described is one that is mentioned in the documentation for RabbitMQ, and elaborated in this blog post. Specifically, if you set a sufficiently large prefetch count, and have a relatively small publish rate, your RabbitMQ server turns into a fancy network switch. When acknowledgement mode is set to automatic, prefetch limiting is effectively disabled, as there are never unacknowledged messages. With automatic acknowledgement, the message is acknowledged as soon as it is delivered. This is the same as having an arbitrarily large prefetch count.
With prefetch >1, the messages are stored within a buffer in the client library. The exact data structure will depend upon the client library used, but to my knowledge, all implementations store the messages in RAM. Further, with automatic acknowledgements, you have no way of knowing when a specific consumer actually read and processed a message.
So, there are a few takeaways here:
Prefetch limit is irrelevant with automatic acknowledgements, as there are never any unacknowledged messages, thus
Automatic acknowledgements don't make much sense when using a consumer
Sufficiently-large prefetch when auto-ack is off, or any use of autoack = on will result in the message broker not doing any queuing, and instead doing routing only.
Now, here's a little bit of expert opinion. I find the whole notion of a message broker that "pushes" messages out to be a little backwards, and for this very reason- it's difficult to configure properly, and it is unclear what the benefit is. A queue system is a natural fit for a pull-based system. The processor can ask the broker for the next message when it is done processing the current message. This approach will ensure that load is balanced naturally and the messages don't get lost when processors disconnect or get knocked out.
Therefore, my recommendation is to drop the use of consumers altogether and switch over to using basic.get.
I have a Java application with a number of components communicating via JMS (ActiveMQ). Currently the application and the JMS Hub are on the same server although we eventually plan to split out the components for scalability. Currently we are having significant issues with performance, all seemingly around JMS, most notably, and the focus of this question is the amount of time it is taking to publish a message to a topic.
We have around 50 dynamically created topics used for communication between the components of the application. One component reads records from a table and processes them one at a time, the processing involves creating a JMS Object message and publishing it to one of the topics. This processing could not keep up with the rate at which records were being written to the source table ~23/sec, so we changed the processing to create the JMS Object message and add it to a queue. A new thread was created which read from this queue and published the message to the appropriate topic. Obviously this does not speed the processing up but it did allow us to see how far behind we were getting by looking at the size of the queue.
At the start of the day no messages are going through the whole system, this quickly ramps up from 1560000 (433/sec) messages through the hub in the first hour to 2100000 (582/sec) in the 3rd hour and then staying at that level. At the start of the first hour the message publishing from the component reading records from the database table keeps up however, by the end of that hour there are 2000 messages in the queue waiting to be sent and by the 3rd hour the queue has 9000 messages in it.
Below are the appropiate sections of the code which send the JMS messages, any advice on what we are doing wrong or how we can improve this performance are much appreciated. Looking at stats on the web JMS should be able to easily handle ~1000-2000 large messages/sec or ~10000 small messages/sec. Our messages are around 500 bytes each so I imagine sit somewhere in the middle of that scale.
Code for getting the publisher:
private JmsSessionPublisher getJmsSessionPublisher(String topicName) throws JMSException {
if (!this.topicPublishers.containsKey(topicName)) {
TopicSession pubSession = (ActiveMQTopicSession) topicConnection.createTopicSession(false, Session.AUTO_ACKNOWLEDGE);
ActiveMQTopic topic = getTopic(topicName, pubSession);
// Create a JMS publisher and subscriber
TopicPublisher publisher = pubSession.createPublisher(topic);
this.topicPublishers.put(topicName, new JmsSessionPublisher(pubSession, publisher));
}
return this.topicPublishers.get(topicName);
}
Sending the message:
JmsSessionPublisher jmsSessionPublisher = getJmsSessionPublisher(topicName);
ObjectMessage objMessage = jmsSessionPublisher.getSession().createObjectMessage(messageObj);
objMessage.setJMSCorrelationID(correlationID);
objMessage.setJMSTimestamp(System.currentTimeMillis());
jmsSessionPublisher.getPublisher().publish(objMessage, false, 4, 0);
Code which adds messages to the queue:
List<EventQueue> events = eventQueueDao.getNonProcessedEvents();
for (EventQueue eventRow : events) {
IEvent event = eventRow.getEvent();
AbstractEventFactory.EventType eventType = AbstractEventFactory.EventType.valueOf(event.getEventType());
String topic = event.getTopicName() + topicSuffix;
EventMsgPayload eventMsg = AbstractEventFactory.getFactory(eventType).getEventMsgPayload(event);
synchronized (queue) {
queue.add(new QueueElement(eventRow.getEventId(), topic, eventMsg));
queue.notify();
}
}
Code in the thread removing items from the queue:
jmsSessionFactory.publishMessageToTopic(e.getTopic(), e.getEventMsg(), Integer.toString(e.getEventMsg().hashCode()));
publishMessageToTopic executes the 'Sending the message' code above.
Other JMS implementations are an option if the consensus is that ActiveMQ may not be the best option.
Thank you,
James
We do not use ActiveMQ, but we ran into similar issues, we discovered that the issues were with the back-end processing and not with the Java side. There could be multiple issues here:
The program processing the messages from the Queue could be slow (e.g. CICS on mainframe) it might not be able to keep up with the messages that are sent to the queue. One possible solution for this is to increase the processing power (or optimize the back end code which processes the messages)
Check the messages on the queue, sometimes there are are lots of uncommitted poison messages on the queue, we use a separate queue for such messages.
It would nice to know the answers to the questions asked by Karianna.
It's not 100% clear where you are experiencing the slow performance, but it sounds like what you are describing is slowness in publishing the messages. Are you creating a new publisher every time you publish a message? If so, this is terribly inefficient and you should consider creating one publisher and use it over and over to send messages. Furthermore, if you are sending persistent messages, then you are probably using synchronous sends to the broker. You might want to consider using asynchronous sends to speed things up. For more info, see the doc about Async Sends
Also, how is the performance of the consumers? How many consumers are being used? Are they able to keep pace with the rate at which messages are being published?
Additionally, what is the broker configuration that you are using? Has it been tuned at all?
Bruce
Although this is an old question, there is one very very important advice missing:
Investigate the amount of topics and queues that you have.
ActiveMQ keeps subscription topics in separate threads. Particularly, when you have large amounts of different topics, this will drag down any server. Think about using JMS selectors instead.
I ran into a similar situation where I had thousands of market data messages per second. When I naively dumped each message into a market instrument specific channel, the server was able to stand about an hour before it was spitting out error messages to the message producers. I changed the design to have ONE channel "MARKET_DATA" and I then set header properties on all produced messages and set a selector on the consumer side to select just the messages that I want. Note that my selector is in SQL like syntax and runs on the server though ... (yeah, let's skip the CEP marketing hype bashing) ...