Read 1 message concurrently from multiple Kafka topics - java

I set the concurrency as 1 for my Kafka Listener.
ConcurrentKafkaListenerContainerFactory<String, Map<String, Object>>
factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConcurrency(conncurrency);
factory.setConsumerFactory(consumerFactory());
factory.setRetryTemplate(retryTemplate());
I am listening to 3 different topics
#KafkaListener(topics = "#{'${kafka.consumer.topic.name}'.split(',')}", containerFactory = "kafkaListenerContainerFactory")
public void listen(#Payload Map<String, Object> conciseMap,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition,
#Header(KafkaHeaders.OFFSET) int offset,
Acknowledgment ack) {
processMessage(conciseMap,partition,offset,ack,false);
}
In this case , will the listener read one message from the first topic & once it is processed read 1 message from next topic and so on? Or will it concurrently process 1 message from each topic.
If it is former , is there a way to read 1 message concurrently from all the topics without creating multiple listeners?

There is no guarantee how the Kafka broker will allocate the partitions across the container threads; if you only have one partition; they will probably all be allocated to the same container thread. That's what just happened when I ran a test with container concurrency=3...
2017-10-31 16:40:26.066 INFO 35202 --- [ntainer#0-2-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[]
2017-10-31 16:40:26.066 INFO 35202 --- [ntainer#0-1-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[]
2017-10-31 16:40:26.079 INFO 35202 --- [ntainer#0-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[bar-0, foo-0, baz-0]
With 10 partitions per topic, I got this distribution...
2017-10-31 16:46:19.279 INFO 35900 --- [ntainer#0-1-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[foo10-5, foo10-6, foo10-4, baz10-5, baz10-4, baz10-6, bar10-5, bar10-4, bar10-6]
2017-10-31 16:46:19.279 INFO 35900 --- [ntainer#0-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[bar10-1, bar10-0, bar10-3, bar10-2, baz10-1, baz10-0, baz10-3, baz10-2, foo10-3, foo10-1, foo10-2, foo10-0]
2017-10-31 16:46:19.279 INFO 35900 --- [ntainer#0-2-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[baz10-9, baz10-8, baz10-7, bar10-9, bar10-8, foo10-9, bar10-7, foo10-7, foo10-8]
As you can see, some partitions from each topic were allocated to each thread. But two of the threads got 9 partitions total while one got 12.
If you want complete control, I would suggest a listener per topic.

You don't need to create multiple listeners - you only need as big concurrency as much partitions you have throughout all the topics provides or even more.
There will be just spinned such an amount of KafkaMessageListenerContainer and each of them will work in its own thread. You still are able to use the same #KafkaListener method. As long as you are stateless there, you don't have any problem with the concurrency.

Related

KTable causes unsubscribe from topics

I'm writing a basic Kafka streams app in Java which reads wikipedia events provided by a producer and attempts to count the amount of created and recently changed events according to user type (bot or human).
I created a custom serdes for the wikipedia events and am able to successfully print both the created and modified events to the screen from my KStreams.
My next step was to create a KTable in which I will count the created events per user type.
It seems that after the KTable is created the rest of the code does not execute.
I don't get an error message and my app seems to be running, but nothing is printed and maybe not even processed.
My code is as following:
StreamsBuilder builder = new StreamsBuilder();
KStream<String, WikiEvent> allEvents =
builder.stream(topicList, Consumed.with(Serdes.String(), WikiEventSerdes.WikiEvent()));
KStream<String, WikiEvent> createEvents = allEvents.filter((key, value) -> value.getStream().equals("create"));
KStream<String, WikiEvent> changeEvents = allEvents.filter((key, value) -> value.getStream().equals("change"));
createEvents.foreach((k,v)->System.out.println("p2 Key= " + k + " Value=" + v.getStream()));
KTable<String, Long> createdPagesUserTypeTable = createEvents.groupBy((key, value) -> value.getUserType()).count();
KStream<String, Long> tableStream = createdPagesUserTypeTable.toStream();
tableStream.foreach((k,v)->System.out.println("Key= " + k + " Value=" + v));
The reason I suspect that nothing executes past the KTable is because the print of the createEvents stream never happens when the KTable definition is present.
Once I remove all lines from the KTable down, I get the prints.
What's gone wrong here?
Also, is there a log of some sort where I can see the execution of my code?
An update:
After looking at the server logs I see this when defining the KTable:
[2022-05-27 19:58:38,983] INFO [GroupCoordinator 0]: Dynamic member with unknown member id joins group streams-wiki in Empty state. Created a new member id streams-wiki-8ea96db7-0052-421a-b7c0-a56cedf9f43e-StreamThread-1-consumer-aa4e311e-2712-4054-ac59-9b56f13d2231 and request the member to rejoin with this id. (kafka.coordinator.group.GroupCoordinator)
[2022-05-27 19:58:38,995] INFO [GroupCoordinator 0]: Preparing to rebalance group streams-wiki in state PreparingRebalance with old generation 2 (__consumer_offsets-22) (reason: Adding new member streams-wiki-8ea96db7-0052-421a-b7c0-a56cedf9f43e-StreamThread-1-consumer-aa4e311e-2712-4054-ac59-9b56f13d2231 with group instance id None; client reason: rebalance failed due to 'The group member needs to have a valid member id before actually entering a consumer group.' (MemberIdRequiredException)) (kafka.coordinator.group.GroupCoordinator)
[2022-05-27 19:58:38,999] INFO [GroupCoordinator 0]: Stabilized group streams-wiki generation 3 (__consumer_offsets-22) with 1 members (kafka.coordinator.group.GroupCoordinator)
[2022-05-27 19:58:39,274] INFO [GroupCoordinator 0]: Assignment received from leader streams-wiki-8ea96db7-0052-421a-b7c0-a56cedf9f43e-StreamThread-1-consumer-aa4e311e-2712-4054-ac59-9b56f13d2231 for group streams-wiki for generation 3. The group has 1 members, 0 of which are static. (kafka.coordinator.group.GroupCoordinator)
[2022-05-27 19:58:39,934] INFO [GroupCoordinator 0]: Preparing to rebalance group streams-wiki in state PreparingRebalance with old generation 3 (__consumer_offsets-22) (reason: Removing member streams-wiki-8ea96db7-0052-421a-b7c0-a56cedf9f43e-StreamThread-1-consumer-aa4e311e-2712-4054-ac59-9b56f13d2231 on LeaveGroup; client reason: the consumer unsubscribed from all topics) (kafka.coordinator.group.GroupCoordinator)
[2022-05-27 19:58:39,934] INFO [GroupCoordinator 0]: Group streams-wiki with generation 4 is now empty (__consumer_offsets-22) (kafka.coordinator.group.GroupCoordinator)
[2022-05-27 19:58:39,938] INFO [GroupCoordinator 0]: Member MemberMetadata(memberId=streams-wiki-8ea96db7-0052-421a-b7c0-a56cedf9f43e-StreamThread-1-consumer-aa4e311e-2712-4054-ac59-9b56f13d2231, groupInstanceId=None, clientId=streams-wiki-8ea96db7-0052-421a-b7c0-a56cedf9f43e-StreamThread-1-consumer, clientHost=/127.0.0.1, sessionTimeoutMs=45000, rebalanceTimeoutMs=300000, supportedProtocols=List(stream)) has left group streams-wiki through explicit `LeaveGroup`; client reason: the consumer unsubscribed from all topics (kafka.coordinator.group.GroupCoordinator)
so it appears that my KTable has somehow caused an unsubscribe from all topics.
Any idea why this is happening?
In the end it turned out that my java consumer was failing due to a missing StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG definition.
The way to understand this was by adding the following lines of code after
KafkaStreams streams = new KafkaStreams(topology, props); :
streams.setUncaughtExceptionHandler((Thread t, Throwable e) -> {
System.out.println(e);
});
This will output debug to the command window and show additional logs for the consumer.

Kafka Consumer in spring can I re-assign partitions programmatically?

I'm new to Kafka, and using #KafkaListener (spring) to define kafka consumer.
I would like to check whether its possible to manually assign the partition to the consumer in runtime.
For example, when the application starts I don't want to "consume" any data. I'm using currently #KafkaListener(autoStartup=false ... ) for that purpose.
At some point, I'm supposed to get a notification (from another part of the application) that contains a partitionId to work on, so I would like to "skip" to the latest available offset of that partition because I don't need to consume the data that has happened to already exist there and "associate" the KafkaConsumer with the partitionId from that notification.
Later on I might get a notification to "Stop listening to this partition", despite the fact the the producer that exists somewhere else keeps writing to that topic and to that partition, so I should "unlink" the consumer from the partition and stop getting messages.
I saw there is a org.springframework.kafka.annotation.TopicPartition but it provides a way to specify a "static" association, so I'm looking for a "dynamic" way to do so.
I guess I could resort to the low-level Kafka Client API but I would really prefer to use spring here.
UPDATE
I use topic cnp_multi_partition_test_topic with 3 partitions.
My Current Code that tries to manage partitions dynamically from the consumer looks like this:
#Slf4j
public class SampleKafkaConsumer {
#KafkaListener(id = Constants.CONSUMER_ID, topics = Constants.TEST_TOPIC, autoStartup = "false")
public void consumePartition(#Payload String data, #Headers MessageHeaders messageHeaders) {
Object partitionId = messageHeaders.get(KafkaHeaders.RECEIVED_PARTITION_ID);
Object sessionId = messageHeaders.get(KafkaHeaders.RECEIVED_MESSAGE_KEY);
log.info("Consuming from partition: [ {} ] message: Key = [ {} ], content = [ {} ]",partitionId, sessionId, data);
}
}
#RequiredArgsConstructor
public class MultiPartitionKafkaConsumerManager {
private final KafkaListenerEndpointRegistry registry;
private final ConcurrentKafkaListenerContainerFactory<String, String> factory;
private final UUIDProvider uuidProvider;
private ConcurrentMessageListenerContainer<String, String> container;
public void assignPartitions(List<Integer> partitions) {
if(container != null) {
container.stop();
container = null;
}
if(partitions.isEmpty()) {
return;
}
var newTopicPartitionOffsets = prepareTopicPartitionOffsets(partitions);
container =
factory.createContainer(newTopicPartitionOffsets);
container.getContainerProperties().setMessageListener(
registry.getListenerContainer(Constants.CONSUMER_ID).getContainerProperties().getMessageListener());
// random group
container.getContainerProperties().setGroupId("sampleGroup-" + uuidProvider.getUUID().toString());
container.setConcurrency(1);
container.start();
}
private TopicPartitionOffset[] prepareTopicPartitionOffsets(List<Integer> partitions) {
return partitions.stream()
.map(p -> new TopicPartitionOffset(TEST_TOPIC, p, 0L, TopicPartitionOffset.SeekPosition.END))
.collect(Collectors.toList())
.toArray(new TopicPartitionOffset[] {});
}
}
Both are Spring beans (singletons) managed through java configuration.
The producer is generating 3 messages every second and sends it into 3 partitions of the test topic. I've used kafka UI tool to make sure that indeed all the messages arrive as expected I use an #EventListener and #Async to make it happen concurrently.
Here is how do I try to simulate the work:
#SpringBootTest // kafka is available, omitted for brevity
public class MyTest {
#Autowired
MultiPartitionKafkaConsumerManager manager;
#Test
public void test_create_kafka_consumer_with_manual_partition_management() throws InterruptedException {
log.info("Starting the test");
sleep(5_000);
log.info("Start listening on partition 0");
manager.assignPartitions(List.of(0));
sleep(10_000);
log.info("Start listening on partition 0,2");
manager.assignPartitions(List.of(0,2));
sleep(10_000);
log.info("Do not listen on partition 0 anymore");
manager.assignPartitions(List.of(2));
sleep(10_000);
log.info("Do not listen on partition 2 anymore - 0 partitions to listen");
manager.assignPartitions(Collections.emptyList());
sleep(10_000);
Logs show the following:
06:34:20.164 [main] INFO c.h.c.p.g.m.SamplePartitioningTest - Starting the test
06:34:25.169 [main] INFO c.h.c.p.g.m.SamplePartitioningTest - Start listening on partition 0
06:34:25.360 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka version: 2.5.1
06:34:25.360 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka commitId: 0efa8fb0f4c73d92
06:34:25.361 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1633664065360
06:34:25.405 [main] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9-1, groupId=sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9] Subscribed to partition(s): cnp_multi_partition_test_topic-0
06:34:25.422 [main] INFO o.s.s.c.ThreadPoolTaskScheduler - Initializing ExecutorService
06:34:25.429 [consumer-0-C-1] INFO o.a.k.c.c.i.SubscriptionState - [Consumer clientId=consumer-sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9-1, groupId=sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9] Seeking to LATEST offset of partition cnp_multi_partition_test_topic-0
06:34:35.438 [main] INFO c.h.c.p.g.m.SamplePartitioningTest - Start listening on partition 0,2
06:34:35.445 [consumer-0-C-1] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9-1, groupId=sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9] Unsubscribed all topics or patterns and assigned partitions
06:34:35.445 [consumer-0-C-1] INFO o.s.s.c.ThreadPoolTaskScheduler - Shutting down ExecutorService
06:34:35.453 [consumer-0-C-1] INFO o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9: Consumer stopped
06:34:35.467 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka version: 2.5.1
06:34:35.467 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka commitId: 0efa8fb0f4c73d92
06:34:35.467 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1633664075467
06:34:35.486 [main] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb-2, groupId=sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb] Subscribed to partition(s): cnp_multi_partition_test_topic-0, cnp_multi_partition_test_topic-2
06:34:35.487 [main] INFO o.s.s.c.ThreadPoolTaskScheduler - Initializing ExecutorService
06:34:35.489 [consumer-0-C-1] INFO o.a.k.c.c.i.SubscriptionState - [Consumer clientId=consumer-sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb-2, groupId=sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb] Seeking to LATEST offset of partition cnp_multi_partition_test_topic-0
06:34:35.489 [consumer-0-C-1] INFO o.a.k.c.c.i.SubscriptionState - [Consumer clientId=consumer-sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb-2, groupId=sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb] Seeking to LATEST offset of partition cnp_multi_partition_test_topic-2
06:34:45.502 [main] INFO c.h.c.p.g.m.SamplePartitioningTest - Do not listen on partition 0 anymore
06:34:45.503 [consumer-0-C-1] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb-2, groupId=sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb] Unsubscribed all topics or patterns and assigned partitions
06:34:45.503 [consumer-0-C-1] INFO o.s.s.c.ThreadPoolTaskScheduler - Shutting down ExecutorService
06:34:45.510 [consumer-0-C-1] INFO o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb: Consumer stopped
06:34:45.527 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka version: 2.5.1
06:34:45.527 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka commitId: 0efa8fb0f4c73d92
06:34:45.527 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1633664085527
06:34:45.551 [main] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698-3, groupId=sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698] Subscribed to partition(s): cnp_multi_partition_test_topic-2
06:34:45.551 [main] INFO o.s.s.c.ThreadPoolTaskScheduler - Initializing ExecutorService
06:34:45.554 [consumer-0-C-1] INFO o.a.k.c.c.i.SubscriptionState - [Consumer clientId=consumer-sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698-3, groupId=sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698] Seeking to LATEST offset of partition cnp_multi_partition_test_topic-2
06:34:55.560 [main] INFO c.h.c.p.g.m.SamplePartitioningTest - Do not listen on partition 2 anymore - 0 partitions to listen
06:34:55.561 [consumer-0-C-1] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698-3, groupId=sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698] Unsubscribed all topics or patterns and assigned partitions
06:34:55.562 [consumer-0-C-1] INFO o.s.s.c.ThreadPoolTaskScheduler - Shutting down ExecutorService
06:34:55.576 [consumer-0-C-1] INFO o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698: Consumer stopped
So I do see that the consumer is started, it even tries to poll the records internally, but I think I see the WakeupException thrown and "swallowed" by a proxy. I'm not sure I understand why does it happen?
You can't change manual assignments at runtime. There are several ways to achieve your desired result.
You can declare the listener in a prototype bean; see Can i add topics to my #kafkalistener at runtime
You can use the listener container factory to create a new container with the appropriate topic configuration and copy the listener from the statically declared container.
I can provide an example of the latter if needed.
...
EDIT
Here's an example for the second technique...
#SpringBootApplication
public class So69465733Application {
public static void main(String[] args) {
SpringApplication.run(So69465733Application.class, args);
}
#KafkaListener(id = "dummy", topics = "dummy", autoStartup = "false")
void listen(String in) {
System.out.println(in);
}
#Bean
ApplicationRunner runner(KafkaListenerEndpointRegistry registry,
ConcurrentKafkaListenerContainerFactory<String, String> factory) {
return args -> {
System.out.println("Hit Enter to create a container for topic1, partition0");
System.in.read();
ConcurrentMessageListenerContainer<String, String> container1 =
factory.createContainer(new TopicPartitionOffset("topic1", 0, SeekPosition.END));
container1.getContainerProperties().setMessageListener(
registry.getListenerContainer("dummy").getContainerProperties().getMessageListener());
container1.getContainerProperties().setGroupId("topic1-0-group2");
container1.start();
System.out.println("Hit Enter to create a container for topic2, partition0");
System.in.read();
ConcurrentMessageListenerContainer<String, String> container2 =
factory.createContainer(new TopicPartitionOffset("topic2", 0, SeekPosition.END));
container2.getContainerProperties().setMessageListener(
registry.getListenerContainer("dummy").getContainerProperties().getMessageListener());
container2.getContainerProperties().setGroupId("topic2-0-group2");
container2.start();
System.in.read();
container1.stop();
container2.stop();
};
}
}
EDIT
Log after sending records to topic1, topic2 from the command-line producer.
Hit Enter to create a container for topic1, partition0
ConsumerConfig values:
...
Kafka version: 2.7.1
Kafka commitId: 61dbce85d0d41457
Kafka startTimeMs: 1633622966736
[Consumer clientId=consumer-topic1-0-group2-1, groupId=topic1-0-group2] Subscribed to partition(s): topic1-0
Hit Enter to create a container for topic2, partition0
[Consumer clientId=consumer-topic1-0-group2-1, groupId=topic1-0-group2] Seeking to LATEST offset of partition topic1-0
[Consumer clientId=consumer-topic1-0-group2-1, groupId=topic1-0-group2] Cluster ID: ppGfIGsZTUWRTNmRXByfZg
[Consumer clientId=consumer-topic1-0-group2-1, groupId=topic1-0-group2] Resetting offset for partition topic1-0 to position FetchPosition{offset=2, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:9092 (id: 0 rack: null)], epoch=0}}.
ConsumerConfig values:
...
Kafka version: 2.7.1
Kafka commitId: 61dbce85d0d41457
Kafka startTimeMs: 1633622969071
[Consumer clientId=consumer-topic2-0-group2-2, groupId=topic2-0-group2] Subscribed to partition(s): topic2-0
Hit Enter to stop containers
[Consumer clientId=consumer-topic2-0-group2-2, groupId=topic2-0-group2] Seeking to LATEST offset of partition topic2-0
[Consumer clientId=consumer-topic2-0-group2-2, groupId=topic2-0-group2] Cluster ID: ppGfIGsZTUWRTNmRXByfZg
[Consumer clientId=consumer-topic2-0-group2-2, groupId=topic2-0-group2] Resetting offset for partition topic2-0 to position FetchPosition{offset=2, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:9092 (id: 0 rack: null)], epoch=0}}.
record from topic1
[Consumer clientId=consumer-topic1-0-group2-1, groupId=topic1-0-group2] Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)
record from topic2
[Consumer clientId=consumer-topic2-0-group2-2, groupId=topic2-0-group2] Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)
Application shutdown requested.

problem with kafka client when broker is down

I am seeing this exception in my kafka client when the broker is down:
java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
at org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2452)
at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:2436)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1217)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210)
at com.actimize.infrastructure.config.KafkaAlertsDistributor$1.run(KafkaAlertsDistributor.java:71)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The problem is, I am not running a multi-threaded application. I am running an hello-world example with single thread and wanted to see how it behaves when the broker is down (because I want to start the broker later in unit tests).
Here's my code, give or take:
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.execute (createRunnable());
...
// in the runnable's run method
Properties props = // create props
consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("test-topic"));
while (true) {
ConsumerRecords<String, String> records = null;
try {
System.out.println("going to poll");
records = consumer.poll(Duration.ofSeconds(1));
System.out.println("finished polling, got " + records.count() + " records");
} catch (WakeupException e) {
e.printStackTrace();
continue;
} catch (Throwable e) {
e.printStackTrace();
}
for (ConsumerRecord<String, String> record : records) {
Map<String, Object> data = new HashMap<>();
data.put("partition", record.partition());
data.put("offset", record.offset());
data.put("value", record.value());
System.out.println("consumer got: " + data);
}
}
When the broker is down, the poll() method works fine for the first 4 or 5 times. It returns zero records and it prints a warning to the log. By the 5th or 6th time it starts outputing this error.
Here is a full log. It shows that are two threads (pool-3 and pool-4) doing some work behind the scene, I am not sure why this is happening, it's not coming from my code.
2021-02-21 12:16:00,057 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
2021-02-21 12:16:00,404 WARN [pool-3-thread-1] clients.NetworkClient (NetworkClient.java:757) - [Consumer clientId=consumer-consumer-tutorial-1, groupId=consumer-tutorial] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
2021-02-21 12:16:00,404 WARN [pool-3-thread-1] clients.NetworkClient$DefaultMetadataUpdater (NetworkClient.java:1033) - [Consumer clientId=consumer-consumer-tutorial-1, groupId=consumer-tutorial] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected
2021-02-21 12:16:01,057 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:70) - finished polling, got 0 records
2021-02-21 12:16:01,057 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
2021-02-21 12:16:02,057 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:70) - finished polling, got 0 records
2021-02-21 12:16:02,057 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
2021-02-21 12:16:02,427 INFO [pool-4-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
2021-02-21 12:16:02,923 WARN [pool-3-thread-1] clients.NetworkClient (NetworkClient.java:757) - [Consumer clientId=consumer-consumer-tutorial-1, groupId=consumer-tutorial] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
2021-02-21 12:16:02,924 WARN [pool-3-thread-1] clients.NetworkClient$DefaultMetadataUpdater (NetworkClient.java:1033) - [Consumer clientId=consumer-consumer-tutorial-1, groupId=consumer-tutorial] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected
2021-02-21 12:16:03,058 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:70) - finished polling, got 0 records
2021-02-21 12:16:03,058 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
2021-02-21 12:16:03,061 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:75) - error
java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
at org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2452)
at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:2436)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1217)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210)
at com.actimize.infrastructure.config.KafkaConsumerSample$1.run(KafkaConsumerSample.java:69)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "pool-3-thread-1" java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
at org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2452)
at org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:2335)
at org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:2290)
at com.actimize.infrastructure.config.KafkaConsumerSample$1.run(KafkaConsumerSample.java:88)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-02-21 12:16:03,429 INFO [pool-4-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:70) - finished polling, got 0 records
2021-02-21 12:16:03,429 INFO [pool-4-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
Looking at the logs you've shared, two thread starting to poll almost at the same time:
2021-02-21 12:16:02,057 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
2021-02-21 12:16:02,427 INFO [pool-4-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
There are extra measurements to be taken into consideration in order to implement multithreaded consumer.
The most important points that you may want to tackle are:
Ensure that records from the same partitions are processed only by one thread at a time
Commit offsets only after records are processed
Handle group rebalancing properly
Further reading: Kafka Consumer Multi Threaded Messaging

Read data from Kafka using Batch not work correctly using SpringBoot

I using SpringBoot and want read data from Kafka using batch. My application.yml look like this:
spring:
kafka:
bootstrap-servers:
- localhost:9092
properties:
schema.registry.url: http://localhost:8081
consumer:
auto-offset-reset: earliest
max-poll-records: 50000
enable-auto-commit: true
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: io.confluent.kafka.serializers.KafkaAvroDeserializer
group-id: "batch"
properties:
fetch.min.bytes: 1000000
fetch.max.wait.ms: 20000
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: io.confluent.kafka.serializers.KafkaAvroSerializer
listener:
type: batch
My listener:
#KafkaListener(id = "bar2", topics = "TestTopic")
public void listen(List<ConsumerRecord<String, GenericRecord>> records) {
log.info("start of batch receive. Size::{}", records.size());
}
In log I see:
2019-10-04 11:08:19.693 INFO 2123 --- [ bar2-0-C-1] kafka.batch.demo.DemoApplication : start of batch receive. Size::33279
2019-10-04 11:08:19.746 INFO 2123 --- [ bar2-0-C-1] kafka.batch.demo.DemoApplication : start of batch receive. Size::33353
2019-10-04 11:08:19.784 INFO 2123 --- [ bar2-0-C-1] kafka.batch.demo.DemoApplication : start of batch receive. Size::33400
2019-10-04 11:08:19.821 INFO 2123 --- [ bar2-0-C-1] kafka.batch.demo.DemoApplication : start of batch receive. Size::33556
2019-10-04 11:08:39.859 INFO 2123 --- [ bar2-0-C-1] kafka.batch.demo.DemoApplication : start of batch receive. Size::16412
I set the required, settings: fetch.min.bytes and fetch.max.wait.ms, but they do not give any effect.
In a log I see that a pack in the size no more than 33 thousand at any settings. I broke my mind and I don't understand why is this happening?
max.poll.records is simply a maximum.
There are other properties that influence how many records you get
fetch.min.bytes - The minimum amount of data the server should return for a fetch request. If insufficient data is available the request will wait for that much data to accumulate before answering the request. The default setting of 1 byte means that fetch requests are answered as soon as a single byte of data is available or the fetch request times out waiting for data to arrive. Setting this to something greater than 1 will cause the server to wait for larger amounts of data to accumulate which can improve server throughput a bit at the cost of some additional latency.
fetch.max.wait.ms- The maximum amount of time the server will block before answering the fetch request if there isn't sufficient data to immediately satisfy the requirement given by fetch.min.bytes.
See the documentation.
There is no way to exactly control the minimum number of records (unless they are all identical in length).

Kafka Consumer (Java) polls 0 messages

I am seeing this issue in my Kafka Java client where the consumer stop consuming after polling a few messages. Its not that the consumer hangs. Its unable to find messages in the topic partitions and polls 0 message. I have 4 partitions configured for the topic and 2 consumers for the consumer group.
Consumer Log:
Thread-5:2016-05-11 at 07:35:21.893 UTC INFO xxxxx.KafkaConsumerClient:71 pullFromQueue polled 0 messages from topic: test:[test-0, test-1] partition : []
Thread-5:2016-05-11 at 07:35:31.893 UTC INFO xxxxx.KafkaConsumerClient:71 pullFromQueue polled 0 messages from topic: test:[test-0, test-1] partition : []
Thread-5:2016-05-11 at 07:35:41.893 UTC INFO xxxxx.KafkaConsumerClient:71 pullFromQueue polled 0 messages from topic: test:[test-0, test-1] partition : []
Thread-5:2016-05-11 at 07:35:51.893 UTC INFO xxxxx.KafkaConsumerClient:71 pullFromQueue polled 0 messages from topic: test:[test-0, test-1] partition : []
Here, the log suggests that this consumer is connected to partitions 0 and 1 but is unable to consumer any message.
Consumer Offset:
Group Topic Pid Offset logSize Lag Owner
test-consumer test 0 1147335 1150034 2699 none
test-consumer test 1 1147471 1150033 2562 none
test-consumer test 2 1150035 1150035 0 none
test-consumer test 3 1150031 1150031 0 none
Here this shows that my topic has 2699 and 2562 messages pending on partitions 0 and 1 respectively

Categories