How to balance publishers' requests with RabbitMQ? - java

Suppose you have multiple producers and one consumer which wants to receive persistent messages from all publishers available.
Producers work with different speed. Let's say that system A produces 10 requests/sec and system B 1 request/sec. So if you use the only queue you will process 10 messages from A then 1 message from B.
But what if you want to balance load and process one message from A then one message from B etc.? Consuming from multiple queues is not a good option because we can't use wildcard binding in this case.
Update:
Queue per producer seems as the best approach. Producers don't know their speed which changes constantly. Having one queue per consumer I can subscribe to one topic and receive messages from all publishers available. But having a queue per producer I need to code the logic by myself:
Get all available queues through management plugin (AMQP doesn't allow to list queues).
Filter by queue name.
Implement round robin strategy.
Implement notification mechanism to subscribe to new publishers that can appear at any moment.
Remove unnecessary queue when publisher had disappeared and client read all messages.
Well, it seems pretty easy but I thought that broker could provide all of this functionality without any coding. In case with one queue I just create one persistent queue, bind it to a topic exchange then start any number of publishers that send messages to the topic. This option works almost out of the box.

I know I'm late for the party, but still.
In Azure Service Bus terms it's called "partitioning" and it's based on the partition key. The best part is in Azure SB the receiving client is not aware of the partitioning, it simply subscribes to the single queue.
In RabbitMQ there is a X-Consistent-Hashing plugin ("rabbitmq_consistent_hash_exchange") but unfortunately it's not that convenient. The consumers must be explicitly configured to consume from specific queues. If you have ten queues then you need to setup your consumers so that all ten are covered.
Another two options:
Random Exchange Type
Sharding Plugin
Bear in mind that with the Sharding Plugin even though it creates "one logical queue to consume" you'll have to have as many subscribers as there are virtual queues, otherwise some of the queues will be left unconsumed.

You can use the Priority Queue Support and associate a priority according to the producer speed. With the caveat that the priority must be set with caution (for example, if the consumer speed is below the system B, the consumer will only consume messages from B) and producers must be aware of their producing speed.
Another option to consider is creating 3 types of queues according to the producing speed: HIGH, MEDIUM, LOW. The three queues are binded to the exchange with the binding key set according to the producing speed. It could be done using.
Consumer will consume messages from these 3 queues using a round robin strategy. With the caveat that producers must be aware of their producing speed.
But the best option may be a queue per producer especially if producers speed is not stable and cannot be categorized. Thus, producers do not need to know their producing speed.

Related

2 Spring #JmsListeners on 1 queue

I have 2 #JmsListener instances on 1 queue, and I want to take a fixed number of messages from the queue and then hold the rest in pending for some time for bulk processing. I have added the condition to check the number of pending message, but due to 2 listeners it is failing. Also, I have to add this condition only inside #JmsListener.
Please suggest how to add the logic of taking fixed messages from queue and holding the rest in pending for achieving throttling.
I don't believe you will be able to use Spring's #JmsListener to do what you want because you simply don't have the control of the consumer which you need to fetch multiple messages and then process them all at once. A listener only gets one message at time and it is invoked as messages arrive so you have no control over when and how you fetch the messages in contrast to a normal JMS MessageConsumer which you can use to manually invoke receive() as many times as you like.
Also, ActiveMQ will do its best to treat each consumer fairly and therefore distribute the same amount of messages to each. Generally speaking, it is bad for one consumer to get all (or most) the messages as it can starve the other consumers and waste resources. That said, you could potentially use consumer priority if you really needed some consumers to get more messages than others.

How to control the number of messages that being emitted by Apache Kafka per a specific time?

I am new to Apache Kafka and I am trying to configure Apache Kafka that it receives messages from the producer as much as possible but it only sends to the consumer configured number of messages per specific time.
In other words How to configure Apache Kafka to send only "50 messages for example" per "30 seconds"
to the consumer regardless of the number of the messages, and in the next 30 seconds it takes another 50 messages from the cashed messages and so on.
If you have control over the consumer
You could use max.poll.records property to limit max number of records per poll() method call. And then you only need to ensure that poll() is called once in 30 seconds.
In general you can take a look at all available configuration properties here.
If you cannot control consumer
Then the only option for you is to write messages as per your demand - write at most 50 messages in 30 seconds. There are no configuration options available. Only your application logic can achieve that.
updated - how to control ensure call to poll
The simplest way is to:
while (true) {
consumer.poll()
// .. do your stuff
Thread.sleep(30000);
}
You can make things more complex with measuring time for processing (i.e. starting after poll call up to Thread.sleep() to not wait more then 30 seconds at all.
The problem that producer really doesn't send messages to the consumer. There is that persistent Kafka topic in between where producer places its messages. And it really doesn't care if there is any consumer on the other side. Same from the consumer perspective: it just subscribers for data from the topic and doesn't care if there is some producer on the other side. So, thinking about a back-pressure from the consumer down to producer where there is a messaging middle ware is wrong direction.
On the other hand it is not clear how those consumed messages may impact your third party service. The point is that Kafka consumer is single-threaded per partition. So, all the messages from one partition is going to be (must) processed one by one in the same thread. This way you cannot send more than one messages to your service: the next one can be sent only when the previous has been replied. So, think about it: how it is even possible in your consumer application to excess rate limit?
However if you have enough partitions and high concurrency on the consumer side, so you really may end up with several requests to your service in parallel from different threads. For this purpose I would suggest to take a look into a Rate Limiter pattern. This library provides a good implementation: https://resilience4j.readme.io/docs/ratelimiter. It is much better to keep messages in the topic then try to limit producer somehow.
To conclude: even if the consumer side is not your project, it is better to discuss with that team how to improve their consumer. You did your part well: the producer sends messages to Kafka topic. What else you can do over here?
Interesting use case and not sure why you need it, but two possible solutions: 1. To protect the cluster, you could use quotas, not for amount of messages but for bandwidth throughput: https://kafka.apache.org/documentation/#design_quotas . 2. If you need an exact amount of messages per time frame, you could put a buffering service (rate limiter) in between where you consume and pause, publishing messages to the consumed topic. Rate limiter could consume next 50 then pause until minute passes. This will increase space used on your cluster because of duplicated messages. You also need to be careful of how to pause the consumer, hearbeats need to be sent else you will rebalance your consumer continuously, ie you can't just sleep till next minute. This is obviously if you can't control the end consumer.

Is Kafka the right solution for messages with dependencies?

We have messages which are dependent.Ex. say we have 4 messages M1, M2, M1_update1,(should be processed only after M1 is processed),M3 (should be processed only after M1,M2 are processed).
In this example, only M1 and M2 can be processed in parallel, others have to be sequential. I know messages in one partition of Kafka topic are processed sequentially. But how do I know that M1,M2 are processed and now is the time to push M1_update1 and M3 messages to the topic? Is Kafka right choice for this kind of use-case? Any insights is appreciated!!
Kafka is used as pub-sub messaging system which is highly scalable and fault tolerant.
I believe using kafka alone when your messages are interdependent could be a bad choice. The processing you require is condition based probably you need a routing engine such as camel or drool to achieve the end result.
You're basically describing a message queue that guarantees ordering. Kafka, by design, does not guarantee ordering, except in the case you mention, where the topic has a single partition. In that case, though, you're not taking full advantage of Kafka's ability to maximize throughput by parallelizing data in partitions.
As far as messages being dependent on each other, that would require a logic layer that core Kafka itself doesn't provide. If I understand it correctly, and the processing happens after the message is consumed from Kafka, you would need some sort of notification on the consumer end, which would receive and process M1 and M2 and somehow notify the producer on the other side it's now ok to send M1_update and M3. This is definitely outside the scope of what core Kafka provides. You could still use Kafka to build something like this, but there's probably other solutions that would work better for you.

Maximum number of Active MQ Consumers on a Queue

I am setting up an application which needs to be scaled. I post messages to Active MQ and read messages from there.
Till now , I have used maximum upto 3 concurrent consumers pointing to a queue( Each consumer operating from a different physical machine ).
I need to know maximum how many consumers I can point to a Queue in Active MQ.
Is there a maximum limit to it ?
I found this link:
http://activemq.apache.org/multiple-consumers-on-a-queue.html
But it does not state anything about Maximum connections / Sessions / consumers. It only says One session per connection.
The JMS specification does not state any limit on the number of consumers. You can add as many consumers as you want for a given Queue or Topic.
The question is how many consumers you really need. Increasing the number of consumers will allow you to do more parallel processing but you will face memory issues. For e.g. If you start thousands of consumers on a single machine it is simply going to start thousands of threads which will consume memory.
Also if you are having multiple consumers for a single Queue it is a good idea to have selectors to filter out messages from the queue so that you can have some control on messages and which listeners should consume them.
Any number of consumer can point to that queue. But only 1 consumer will be able to access the object inside that queue. Once it retrieves the object, that particular consumer will be disconnected and other consumer will get connected to your queue.
You can specify the size of queue in your xml file. You can find it easily in some search engine. I dont remember the tag name exactly.

JMS - How do message selectors work with multiple queue and topic consumers?

Say you have a JMS queue, and multiple consumers are watching the queue for messages. You want one of the consumers to get all of a particular type of message, so you decide to employ message selectors.
For example, you define a property to go in your JMS message header named, targetConsumer. Your message selector, which you apply to the consumer known as, A, is something like WHERE targetConsumer = 'CONSUMER_A'.
It's clear that consumer A will now just grab messages with the property set like it is in in the example. Will the other consumers have awareness of that, though? IOW, will another consumer, unconstrained by a message selector, grab the CONSUMER_A messages, if it looks at the queue before Consumer A? Do I need to apply message selectors like, WHERE targetConsumer <> 'CONSUMER_A' to the others?
I am RTFMing and gathering empirical data now, but was hoping someone might know off the top of their head.
When multiple consumers use the same queue, message selectors need to configured correctly across these consumers so that there is no conflict in determining the intended consumer.
In the case of message-driven-beans (a consumer of JMS messages), the selector can be specified in the ejb-jar.xml file thereby allowing for the configuration to be done at deployment time (instead of the opposing view of specifying the message selector during development).
Edit: In real life, this would make sense when different consumers are responsible for processing messages containing the same headers (often generated by the same producer) written onto the same queue. For instance, message selectors could be used in a trading application, to differentiate between buy and sell orders, when the producer is incapable of writing the JMS messages onto two separate buy and sell queues.
Yes, another consumer which is not using any message selector will get message intended for consumer A (or for that matter any message on top of the queue). Hence when sharing a queue, consumer applications must be disciplined and pick only those messages intended for them.
The 'first' JMS message consumer from a queue will pick up the message if the selector matches. What 'first' means is an implementation detail (could be round-robin, based on priority or network closeness). So when using selectors on queues you need to make sure that these selectors are 'non overlapping'.
More formally: no message must exist that matches 2 selectors on the same queue
This is yet another disadvantage of queues versus topics - in practice you should always consider using topics first. With a topic each matching consumer receives the message.

Categories