How to run hundreds of Kafka consumers on the same machine?

How to run hundreds of Kafka consumers on the same machine? - java

In Kafka docs, it is mentioned that the consumers are not Thread-Safe. To avoid this problem, I read that it is a good idea to run a consumer for every Java process. How can this be achieved?
The number of consumers is not defined, but can change according to need.
Thank,
Alessio

You're right that the documentation specifies that Kafka consumers are not thread-safe. However, it also says that you should run consumers on separate threads,
not processes. That's quite different. See here for an answer with more specifics, geared towards Java/JVM:
https://stackoverflow.com/a/15795159/236528
In general, you can have as many consumers as you want on a Kafka topic. Some of these might share a group id, in which case, all the partitions for that topic will be distributed across all the consumers active at any point in time.
There's much more detail on the Javadoc for the Kafka Consumer, linked at the bottom of this answer, but I copied the two thread/consumer models suggested by the documentation below.
1. One Consumer Per Thread
A simple option is to give each thread its own consumer instance. Here
are the pros and cons of this approach:
PRO: It is the easiest to implement
PRO: It is often the fastest as no inter-thread co-ordination is needed
PRO: It makes in-order processing on a per-partition basis very easy to implement (each thread just processes messages in the order it receives them).
CON: More consumers means more TCP connections to the cluster (one per thread). In general Kafka handles connections very efficiently so this is generally a small cost.
CON: Multiple consumers means more requests being sent to the server and slightly less batching of data which can cause some drop in I/O throughput.
CON: The number of total threads across all processes will be limited by the total number of partitions.
2. Decouple Consumption and Processing
Another alternative is to have one or more consumer threads that do
all data consumption and hands off ConsumerRecords instances to a
blocking queue consumed by a pool of processor threads that actually
handle the record processing. This option likewise has pros and cons:
PRO: This option allows independently scaling the number of consumers
and processors. This makes it possible to have a single consumer that
feeds many processor threads, avoiding any limitation on partitions.
CON: Guaranteeing order across the processors requires particular care
as the threads will execute independently an earlier chunk of data may
actually be processed after a later chunk of data just due to the luck
of thread execution timing. For processing that has no ordering
requirements this is not a problem.
CON: Manually committing the
position becomes harder as it requires that all threads co-ordinate to
ensure that processing is complete for that partition. There are many
possible variations on this approach. For example each processor
thread can have its own queue, and the consumer threads can hash into
these queues using the TopicPartition to ensure in-order consumption
and simplify commit.
In my experience, option #1 is the best for starting out, and you can upgrade to option #2 only if you really need it. Option #2 is the only way to extract the maximum performance from the kafka consumer, but its implementation is more complex. So, give option #1 a try first, and see if it's good enough for your specific use case.
The full Javadoc is available at this link:
https://kafka.apache.org/23/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

Related

Advantage of "forcing" the partition

I have a topic theimportanttopic with three partitions.
Is an advantage to "forcing" the partition assignment?
For instance, I have three consumers in one group. I want consumer1 to always and only consume from partition-0, consumer2 to always and only consume from partition-1 and consumer3 to always and only consume from partition-2.
One consumer should not touch any other partition at any point of time except the one that was assigned.
A drawback I can think of is that when one of the consumers goes down, no one is consuming from the partition.
Let's suppose a fancy self-healing architecture is in place, can bring back any of those lost consumer back very efficiently.
Would it be an advantage, knowing there won't be any partition reassignment cost to the healthy consumers? The healthy consumers can focus on their own partition, etc.
Are there any other pros and cons?
https://docs.spring.io/spring-kafka/reference/html/#tip-assign-all-parts
https://docs.spring.io/spring-kafka/reference/html/#manual-assignment
It seems the API allow possibility of forcing the partition, I was wondering if this use case was one of the purposes of this design.

How do you know what is the "number" of each consumer? Based on your last questions, you've either used kubernetes, or setting concurrency in Spring Kafka. In either case, pods/threads rebalance across partitions of the same executable application... Therefore, you cannot scale them and assign to specific partitions without extra external locking logic.
In my opinion, all executable consumers should be equally be able to handle any partition.
Plus, as you pointed out, there's downtime if one stops.
But, the use case is to exactly match the producer. You've produced data by some custom partitioner logic, therefore, you need exact consumers to read only a subset of that data.
Also, assignment doesn't use consumer groups, so while there would be no rebalancing, it makes it not possible to monitor lag using tools like Burrow or consumer groups cli. Lag would need gathered directly from the consumer metrics themselves.

Choosing optimal multithreading model for new Kafka consumer API

Let me first describe my use-case.
I have topics [ T1 ... Tn ] to which the Kafka consumer(s) need to subscribe to. For each topic, all the data passing though it are logically similar. Let's assume data in different topics don't have any correlation. But once consumed, all the data irrespective of their topics receive the same treatment, i.e. they are fed to Elasticsearch using bulkprocessor api. ES is set up as multi-node cluster.
The kafka consumer javadoc mentions two different multithreading approaches. I'm leaning towards the first approach, i.e. One Consumer Per Thread model. Assuming #partitions / topic = p, I'll have p consumer threads for each topic. So, in total there will be n.p threads. If I've independent bulkprocessor attached to each of these threads, then I can choose to control committed position manually. It'll save me from data loss in case bulkprocessor fails. But the downside is, number of bulkprocessors might become too high and that might slow down elasticsearch ingestion.
The other approach I'm thinking, is to have only one thread per topic, so each thread listens to p partitions and writes to one bulk processor. In that case I've to use auto-commit for offsets, and I might lose data for bulkprocessor failure.
I'd like to know which approach is better, or is there a 3rd approach, better than both of these?
Kafka v0.9.0.x and ES v2.3.x

What's the best way to asynchronously handle low-speed consumer (database) in high performance Java application

One EventHandler(DatabaseConsumer) of the Disruptor calls stored procedures in database, which is so slow that it blocks the Disruptor for some time.
Since I need the Disruptor keep running without blocking. I am thinking adding an extra queue so that EventHandler could serve as Producer and another new-created thread could serve as Consumer to handle database's work, which could be asynchronous without affecting the Disruptor
Here is some constrain:
The object that Disruptor passed to the EventHandler is around 30KB and the number of this object is about 400k. In theory, the total size of the objects that needs to be handled is around 30KBX400K =12GB. So the extra queue should be enough for them.
Since performance matters, GC pause should be avoided.
The heap size of the Java program is only 2GB.
I'm thinking text file as a option. EventHandler(Producer) writes the object to the file and Consumer reads from them and call stored procedure. The problem is how to handle the situation that it reach to the end of the file and how to know the new coming line.
Anyone who has solve this situation before? Any advice?

The short answer is size your disruptor to cope with the size of your bursts not your entire volume, bare in mind the disruptor can just contain a reference to the 30kb object, the entire object does not need to be in the ring buffer.
With any form of buffering before your database will require the memory for buffering the disruptor offers you the option of back pressure on the rest of the system when the database has fallen too far behind. That is to say you can slow the inputs to the disruptor down.
The other option for spooling to files is to look at Java Chronicle which uses memory mapped files to persist things to disk.
The much more complicated answer is take advantage of the batching effects of the disruptor so that your DB can catch up. I.e. using a EventHandler which collects events a batch of events together and submits them to the database as one unit.
This practice allows the EventHandler to become more efficient as things back up thus increasing throughput.

Short answer: don't use disruptor. Use a distributed MQ with retransmission support.
Long answer: If you have fast producers with slow consumers you will need some sort of retransmission mechanism. I don't think you can escape from that unless you can tolerate nasty blocks (i.e. huge latencies) in your system. That's when distributed MQs (Messaging Queues) come to play. Disruptor is not a distributed MQ, but you could try to implement something similar. The idea is:
All messages are sequenced and processed in order by the consumer
If the queue gets full, messages are dropped
If the consumer detects a message gap it will request a retransmission of the lost messages, buffering the future messages until it receives the gap
With that approach the consumer can be as slow as it wants because it can always request the retransmission of any message it lost at any time. What we are missing here is the retransmission entity. In a distributed MQ that will be a separate and independent node persisting all messages to disk, so it can replay back any message to any other node at any time. Since you are not talking about an MQ here, but about disruptor, then you will have to somehow implement that retransmission mechanism yourself on another thread. This is a very interesting problem without an easy answer or recipe. I would use multiple disruptor queues so your consumer could do something like:
Read from the main channel (i.e. main disruptor queue)
If you detect a sequence gap, go to another disruptor queue connected to the replayer thread. You will actually need two queues there, one to request the missing messages and another one to receive them.
The replayer thread would have another disruptor queue from where it is receiving all messages and persisting it to disk.
You are left to make sure your replayer thread can write messages fast enough to disk. If it cannot then there is no escape besides blocking the whole system. Fortunately disk i/o can be done very fast if you know what you are doing.
You can forget all I said if you can just afford to block the producers if the consumers are slow. But if the producers are getting messages from the network, blocking them will eventually give you packet drops (UDP) and probably an IOException (TCP).
As you can see this is a very interesting question with a very complicated answer. At Coral Blocks we have experience developing distributed MQs like that on top of CoralReactor. You can take a look in some of the articles we have on our website.

rabbitMQ consume from 2 queues

I have to write heavy load system, with pretty easy task to do. So i decided to split this tasks into multiple workers in different locations (or clouds). To communicate i want to use rabbitmq queue.
In my system there will be two kinds of software nodes: schedulers and workers. Schedulers will take user input from queue_input, split it into smaller task and put this smaller task into workers_queue. Workers reads this queue and 'do the thing'. I used round-robbin load balancing here - and all works pretty well, as long, as some worker crashed. Then i loose information about task completion (it's not allowed to do single operation twice, each task contains a pack of 50 iterations of doing worker-code with diffirent data).
I consider something like technical_queue - another channel to scheduler-worker communication, and I wonder, how to design it in a good way. I used tutorials from rabbitmq page, so my worker thread looks like :
while(true) {
message = consume(QUEUE,...);
handle(message); //do 50 simple tasks in loop for data in message
}
How can i handle second queue? Another thread we some while(true) {} loop?, or is there a better sollution to this? Maybe should I reuse existing queue with topic exchange? (but i wanted to have independent way of communication, while handling the task, which may take some time.

You should probably take a look at spring-amqp (doc). I hate to tell you to add a layer but that spring library takes care of the threading issues and management of threads with its SimpleMessageListenerContainer. Each container goes to a queue and you can specify # of threads (ie workers) per queue.
Alternatively you can make your own using an ExecutorService but you will probably end up rewriting what SimpleMessageListenerContainer does. Also you just could execute (via OS or batch scripts) more processes and that will add more consumers to each queue.
As far as queue topology is concerned it is entirely dependent on business logic/concerns and generally less on performance needs. More often you had more queues for business reasons and more workers for performance reasons but if a queue gets backed up with the same type of message considering giving that type of message its own queue. What your describing sounds like two queues with multiple consumer on your worker queue.
Other than the threading issue and queue topology I'm not entirely sure what else you are asking.

I would recommend you create a second queue consumer
consumer1 -> queue_process
consumer2 -> queue_process
Both consumers should make listening to the same queue.
Greetings I hope will help

Which Java blocking queue is best for multiple producer and single or multiple consumers scenarios?

Which Java blocking queue is best for multiple producer and single or multiple consumers scenarios?
I am testing with LinkedBlockingQueue but I am getting OutOfMemoryError exception.
I am trying to achieve following things.
producer creates a object & put in a queue.
consumer grabs the data from queue & insert into database. There would be 400 producers and I can adjust consumers as I wish.
Let me know any idea.
Update
Producer : It should listen to Server Socket. It reads the data from socket & construct the object (Domain objects) and put in a queue.
Consumer : Take the object from queue & insert into DB (Hiberante & connection pooling supported)
This my real environment. Process should be able to process at least 200 records/sec.I am testing the scalability of process and how to improve that. I hope this will give better idea.
Helpful links :
vmoptions
Monitoring and Managing Java SE 6 Platform Applications
BlockingQueue

The problem is not which Queue implementation you use but rather how to approach the problem of throttling your producers if your consumers cannot keep up. One possible solution is to create a LinkedBlockingQueue with a fixed capacity, and have your producers call offer(E e), which will return false if the queue is full.
Another possible solution is to tailor the number of producers and consumers accordiingly.

Is each producer a separate thread ? Don't forget that each thread will allocate (by default) 512k of memory for its stack (in your case requiring 200Mb of VM memory just for the threads). You can reduce this via -Xss.
Alternatively, just how big are the objects you're queuing ? I don't think you have a queue problem so much as some sort of scaling issue - e.g. producers producing faster than consumers can consume.

It seems that LinkedBlockingQueue is the best choice for your problem. I would suggest you to run your program with the switch -XX:+HeapDumpOnOutOfMemoryError and analyze the dump file to see what has caused the problem. Then you'll be able to solve it much easier.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.