Good morning guys,
I'm trying to run a Kafka Stream Application but every time that i try, it start and close in sequence. Below is the result printed on the console
[main] WARN org.apache.kafka.clients.consumer.ConsumerConfig - The configuration 'admin.retries' was supplied but isn't a known config.
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 2.1.0
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : eec43959745f444f
[application-brute-test-client-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] Starting
[main] INFO org.apache.kafka.streams.KafkaStreams - stream-client [application-brute-test-client] Started Streams client
[application-brute-test-client-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] State transition from CREATED to RUNNING
[Thread-0] INFO org.apache.kafka.streams.KafkaStreams - stream-client [application-brute-test-client] State transition from RUNNING to PENDING_SHUTDOWN
[kafka-streams-close-thread] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] Informed to shut down
[kafka-streams-close-thread] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] State transition from RUNNING to PENDING_SHUTDOWN
[application-brute-test-client-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] Shutting down
[application-brute-test-client-StreamThread-1] INFO org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=application-brute-test-client-StreamThread-1-restore-consumer, groupId=] Unsubscribed all topics or patterns and assigned partitions
[application-brute-test-client-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
[application-brute-test-client-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] Shutdown complete
[kafka-admin-client-thread | application-brute-test-client-admin] INFO org.apache.kafka.clients.admin.internals.AdminMetadataManager - [AdminClient clientId=application-brute-test-client-admin] Metadata update failed
org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call.
[kafka-streams-close-thread] INFO org.apache.kafka.streams.KafkaStreams - stream-client [application-brute-test-client] State transition from PENDING_SHUTDOWN to NOT_RUNNING
[Thread-0] INFO org.apache.kafka.streams.KafkaStreams - stream-client [application-brute-test-client] Streams client stopped completely
watch out for the following line:
[application-brute-test-client-StreamThread-1] Informed to shut down
The application was informed to shut down, but i don't know why. Can someone help me with this problem?
Here is my simple code only to test the stream:
Properties properties = new Properties();
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "myserver");
properties.put(StreamsConfig.APPLICATION_ID_CONFIG, "application-brute-test");
properties.put(StreamsConfig.CLIENT_ID_CONFIG, "application-brute-test-client");
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.setProperty(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE); // Enable exacly once feature
properties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass()); // Set a default key serde
properties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass()); // Set a default key serde
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> input = builder.stream("neurotech_propostas", Consumed.with(Serdes.String(), Serdes.String()));
input.print(Printed.toSysOut());
KStream<String, String> output = input.mapValues((value) -> value.toUpperCase());
output.to("brute-test-out");
KafkaStreams stream = new KafkaStreams(builder.build(), properties);
stream.cleanUp();
stream.start();
Runtime.getRuntime().addShutdownHook(new Thread(stream::close));
To solve the problem I simply stopped using JUnit to run the Stream and executed through a Main class. Running Kafka Streams via JUnit was causing this trouble.
Maybe in this envirorment the JUnit don't hold the Thread execution?
Related
I'm new to Kafka, and using #KafkaListener (spring) to define kafka consumer.
I would like to check whether its possible to manually assign the partition to the consumer in runtime.
For example, when the application starts I don't want to "consume" any data. I'm using currently #KafkaListener(autoStartup=false ... ) for that purpose.
At some point, I'm supposed to get a notification (from another part of the application) that contains a partitionId to work on, so I would like to "skip" to the latest available offset of that partition because I don't need to consume the data that has happened to already exist there and "associate" the KafkaConsumer with the partitionId from that notification.
Later on I might get a notification to "Stop listening to this partition", despite the fact the the producer that exists somewhere else keeps writing to that topic and to that partition, so I should "unlink" the consumer from the partition and stop getting messages.
I saw there is a org.springframework.kafka.annotation.TopicPartition but it provides a way to specify a "static" association, so I'm looking for a "dynamic" way to do so.
I guess I could resort to the low-level Kafka Client API but I would really prefer to use spring here.
UPDATE
I use topic cnp_multi_partition_test_topic with 3 partitions.
My Current Code that tries to manage partitions dynamically from the consumer looks like this:
#Slf4j
public class SampleKafkaConsumer {
#KafkaListener(id = Constants.CONSUMER_ID, topics = Constants.TEST_TOPIC, autoStartup = "false")
public void consumePartition(#Payload String data, #Headers MessageHeaders messageHeaders) {
Object partitionId = messageHeaders.get(KafkaHeaders.RECEIVED_PARTITION_ID);
Object sessionId = messageHeaders.get(KafkaHeaders.RECEIVED_MESSAGE_KEY);
log.info("Consuming from partition: [ {} ] message: Key = [ {} ], content = [ {} ]",partitionId, sessionId, data);
}
}
#RequiredArgsConstructor
public class MultiPartitionKafkaConsumerManager {
private final KafkaListenerEndpointRegistry registry;
private final ConcurrentKafkaListenerContainerFactory<String, String> factory;
private final UUIDProvider uuidProvider;
private ConcurrentMessageListenerContainer<String, String> container;
public void assignPartitions(List<Integer> partitions) {
if(container != null) {
container.stop();
container = null;
}
if(partitions.isEmpty()) {
return;
}
var newTopicPartitionOffsets = prepareTopicPartitionOffsets(partitions);
container =
factory.createContainer(newTopicPartitionOffsets);
container.getContainerProperties().setMessageListener(
registry.getListenerContainer(Constants.CONSUMER_ID).getContainerProperties().getMessageListener());
// random group
container.getContainerProperties().setGroupId("sampleGroup-" + uuidProvider.getUUID().toString());
container.setConcurrency(1);
container.start();
}
private TopicPartitionOffset[] prepareTopicPartitionOffsets(List<Integer> partitions) {
return partitions.stream()
.map(p -> new TopicPartitionOffset(TEST_TOPIC, p, 0L, TopicPartitionOffset.SeekPosition.END))
.collect(Collectors.toList())
.toArray(new TopicPartitionOffset[] {});
}
}
Both are Spring beans (singletons) managed through java configuration.
The producer is generating 3 messages every second and sends it into 3 partitions of the test topic. I've used kafka UI tool to make sure that indeed all the messages arrive as expected I use an #EventListener and #Async to make it happen concurrently.
Here is how do I try to simulate the work:
#SpringBootTest // kafka is available, omitted for brevity
public class MyTest {
#Autowired
MultiPartitionKafkaConsumerManager manager;
#Test
public void test_create_kafka_consumer_with_manual_partition_management() throws InterruptedException {
log.info("Starting the test");
sleep(5_000);
log.info("Start listening on partition 0");
manager.assignPartitions(List.of(0));
sleep(10_000);
log.info("Start listening on partition 0,2");
manager.assignPartitions(List.of(0,2));
sleep(10_000);
log.info("Do not listen on partition 0 anymore");
manager.assignPartitions(List.of(2));
sleep(10_000);
log.info("Do not listen on partition 2 anymore - 0 partitions to listen");
manager.assignPartitions(Collections.emptyList());
sleep(10_000);
Logs show the following:
06:34:20.164 [main] INFO c.h.c.p.g.m.SamplePartitioningTest - Starting the test
06:34:25.169 [main] INFO c.h.c.p.g.m.SamplePartitioningTest - Start listening on partition 0
06:34:25.360 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka version: 2.5.1
06:34:25.360 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka commitId: 0efa8fb0f4c73d92
06:34:25.361 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1633664065360
06:34:25.405 [main] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9-1, groupId=sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9] Subscribed to partition(s): cnp_multi_partition_test_topic-0
06:34:25.422 [main] INFO o.s.s.c.ThreadPoolTaskScheduler - Initializing ExecutorService
06:34:25.429 [consumer-0-C-1] INFO o.a.k.c.c.i.SubscriptionState - [Consumer clientId=consumer-sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9-1, groupId=sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9] Seeking to LATEST offset of partition cnp_multi_partition_test_topic-0
06:34:35.438 [main] INFO c.h.c.p.g.m.SamplePartitioningTest - Start listening on partition 0,2
06:34:35.445 [consumer-0-C-1] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9-1, groupId=sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9] Unsubscribed all topics or patterns and assigned partitions
06:34:35.445 [consumer-0-C-1] INFO o.s.s.c.ThreadPoolTaskScheduler - Shutting down ExecutorService
06:34:35.453 [consumer-0-C-1] INFO o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - sampleGroup-96640bc4-e34f-4ade-9ff9-7a2d0bdf38c9: Consumer stopped
06:34:35.467 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka version: 2.5.1
06:34:35.467 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka commitId: 0efa8fb0f4c73d92
06:34:35.467 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1633664075467
06:34:35.486 [main] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb-2, groupId=sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb] Subscribed to partition(s): cnp_multi_partition_test_topic-0, cnp_multi_partition_test_topic-2
06:34:35.487 [main] INFO o.s.s.c.ThreadPoolTaskScheduler - Initializing ExecutorService
06:34:35.489 [consumer-0-C-1] INFO o.a.k.c.c.i.SubscriptionState - [Consumer clientId=consumer-sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb-2, groupId=sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb] Seeking to LATEST offset of partition cnp_multi_partition_test_topic-0
06:34:35.489 [consumer-0-C-1] INFO o.a.k.c.c.i.SubscriptionState - [Consumer clientId=consumer-sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb-2, groupId=sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb] Seeking to LATEST offset of partition cnp_multi_partition_test_topic-2
06:34:45.502 [main] INFO c.h.c.p.g.m.SamplePartitioningTest - Do not listen on partition 0 anymore
06:34:45.503 [consumer-0-C-1] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb-2, groupId=sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb] Unsubscribed all topics or patterns and assigned partitions
06:34:45.503 [consumer-0-C-1] INFO o.s.s.c.ThreadPoolTaskScheduler - Shutting down ExecutorService
06:34:45.510 [consumer-0-C-1] INFO o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - sampleGroup-05fb12f3-aba1-4918-bcf6-a1f840de13eb: Consumer stopped
06:34:45.527 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka version: 2.5.1
06:34:45.527 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka commitId: 0efa8fb0f4c73d92
06:34:45.527 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1633664085527
06:34:45.551 [main] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698-3, groupId=sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698] Subscribed to partition(s): cnp_multi_partition_test_topic-2
06:34:45.551 [main] INFO o.s.s.c.ThreadPoolTaskScheduler - Initializing ExecutorService
06:34:45.554 [consumer-0-C-1] INFO o.a.k.c.c.i.SubscriptionState - [Consumer clientId=consumer-sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698-3, groupId=sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698] Seeking to LATEST offset of partition cnp_multi_partition_test_topic-2
06:34:55.560 [main] INFO c.h.c.p.g.m.SamplePartitioningTest - Do not listen on partition 2 anymore - 0 partitions to listen
06:34:55.561 [consumer-0-C-1] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698-3, groupId=sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698] Unsubscribed all topics or patterns and assigned partitions
06:34:55.562 [consumer-0-C-1] INFO o.s.s.c.ThreadPoolTaskScheduler - Shutting down ExecutorService
06:34:55.576 [consumer-0-C-1] INFO o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - sampleGroup-5e12d8c7-5900-434a-959f-98b14adda698: Consumer stopped
So I do see that the consumer is started, it even tries to poll the records internally, but I think I see the WakeupException thrown and "swallowed" by a proxy. I'm not sure I understand why does it happen?
You can't change manual assignments at runtime. There are several ways to achieve your desired result.
You can declare the listener in a prototype bean; see Can i add topics to my #kafkalistener at runtime
You can use the listener container factory to create a new container with the appropriate topic configuration and copy the listener from the statically declared container.
I can provide an example of the latter if needed.
...
EDIT
Here's an example for the second technique...
#SpringBootApplication
public class So69465733Application {
public static void main(String[] args) {
SpringApplication.run(So69465733Application.class, args);
}
#KafkaListener(id = "dummy", topics = "dummy", autoStartup = "false")
void listen(String in) {
System.out.println(in);
}
#Bean
ApplicationRunner runner(KafkaListenerEndpointRegistry registry,
ConcurrentKafkaListenerContainerFactory<String, String> factory) {
return args -> {
System.out.println("Hit Enter to create a container for topic1, partition0");
System.in.read();
ConcurrentMessageListenerContainer<String, String> container1 =
factory.createContainer(new TopicPartitionOffset("topic1", 0, SeekPosition.END));
container1.getContainerProperties().setMessageListener(
registry.getListenerContainer("dummy").getContainerProperties().getMessageListener());
container1.getContainerProperties().setGroupId("topic1-0-group2");
container1.start();
System.out.println("Hit Enter to create a container for topic2, partition0");
System.in.read();
ConcurrentMessageListenerContainer<String, String> container2 =
factory.createContainer(new TopicPartitionOffset("topic2", 0, SeekPosition.END));
container2.getContainerProperties().setMessageListener(
registry.getListenerContainer("dummy").getContainerProperties().getMessageListener());
container2.getContainerProperties().setGroupId("topic2-0-group2");
container2.start();
System.in.read();
container1.stop();
container2.stop();
};
}
}
EDIT
Log after sending records to topic1, topic2 from the command-line producer.
Hit Enter to create a container for topic1, partition0
ConsumerConfig values:
...
Kafka version: 2.7.1
Kafka commitId: 61dbce85d0d41457
Kafka startTimeMs: 1633622966736
[Consumer clientId=consumer-topic1-0-group2-1, groupId=topic1-0-group2] Subscribed to partition(s): topic1-0
Hit Enter to create a container for topic2, partition0
[Consumer clientId=consumer-topic1-0-group2-1, groupId=topic1-0-group2] Seeking to LATEST offset of partition topic1-0
[Consumer clientId=consumer-topic1-0-group2-1, groupId=topic1-0-group2] Cluster ID: ppGfIGsZTUWRTNmRXByfZg
[Consumer clientId=consumer-topic1-0-group2-1, groupId=topic1-0-group2] Resetting offset for partition topic1-0 to position FetchPosition{offset=2, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:9092 (id: 0 rack: null)], epoch=0}}.
ConsumerConfig values:
...
Kafka version: 2.7.1
Kafka commitId: 61dbce85d0d41457
Kafka startTimeMs: 1633622969071
[Consumer clientId=consumer-topic2-0-group2-2, groupId=topic2-0-group2] Subscribed to partition(s): topic2-0
Hit Enter to stop containers
[Consumer clientId=consumer-topic2-0-group2-2, groupId=topic2-0-group2] Seeking to LATEST offset of partition topic2-0
[Consumer clientId=consumer-topic2-0-group2-2, groupId=topic2-0-group2] Cluster ID: ppGfIGsZTUWRTNmRXByfZg
[Consumer clientId=consumer-topic2-0-group2-2, groupId=topic2-0-group2] Resetting offset for partition topic2-0 to position FetchPosition{offset=2, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:9092 (id: 0 rack: null)], epoch=0}}.
record from topic1
[Consumer clientId=consumer-topic1-0-group2-1, groupId=topic1-0-group2] Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)
record from topic2
[Consumer clientId=consumer-topic2-0-group2-2, groupId=topic2-0-group2] Discovered group coordinator localhost:9092 (id: 2147483647 rack: null)
Application shutdown requested.
I am seeing this exception in my kafka client when the broker is down:
java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
at org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2452)
at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:2436)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1217)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210)
at com.actimize.infrastructure.config.KafkaAlertsDistributor$1.run(KafkaAlertsDistributor.java:71)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The problem is, I am not running a multi-threaded application. I am running an hello-world example with single thread and wanted to see how it behaves when the broker is down (because I want to start the broker later in unit tests).
Here's my code, give or take:
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.execute (createRunnable());
...
// in the runnable's run method
Properties props = // create props
consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("test-topic"));
while (true) {
ConsumerRecords<String, String> records = null;
try {
System.out.println("going to poll");
records = consumer.poll(Duration.ofSeconds(1));
System.out.println("finished polling, got " + records.count() + " records");
} catch (WakeupException e) {
e.printStackTrace();
continue;
} catch (Throwable e) {
e.printStackTrace();
}
for (ConsumerRecord<String, String> record : records) {
Map<String, Object> data = new HashMap<>();
data.put("partition", record.partition());
data.put("offset", record.offset());
data.put("value", record.value());
System.out.println("consumer got: " + data);
}
}
When the broker is down, the poll() method works fine for the first 4 or 5 times. It returns zero records and it prints a warning to the log. By the 5th or 6th time it starts outputing this error.
Here is a full log. It shows that are two threads (pool-3 and pool-4) doing some work behind the scene, I am not sure why this is happening, it's not coming from my code.
2021-02-21 12:16:00,057 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
2021-02-21 12:16:00,404 WARN [pool-3-thread-1] clients.NetworkClient (NetworkClient.java:757) - [Consumer clientId=consumer-consumer-tutorial-1, groupId=consumer-tutorial] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
2021-02-21 12:16:00,404 WARN [pool-3-thread-1] clients.NetworkClient$DefaultMetadataUpdater (NetworkClient.java:1033) - [Consumer clientId=consumer-consumer-tutorial-1, groupId=consumer-tutorial] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected
2021-02-21 12:16:01,057 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:70) - finished polling, got 0 records
2021-02-21 12:16:01,057 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
2021-02-21 12:16:02,057 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:70) - finished polling, got 0 records
2021-02-21 12:16:02,057 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
2021-02-21 12:16:02,427 INFO [pool-4-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
2021-02-21 12:16:02,923 WARN [pool-3-thread-1] clients.NetworkClient (NetworkClient.java:757) - [Consumer clientId=consumer-consumer-tutorial-1, groupId=consumer-tutorial] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
2021-02-21 12:16:02,924 WARN [pool-3-thread-1] clients.NetworkClient$DefaultMetadataUpdater (NetworkClient.java:1033) - [Consumer clientId=consumer-consumer-tutorial-1, groupId=consumer-tutorial] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected
2021-02-21 12:16:03,058 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:70) - finished polling, got 0 records
2021-02-21 12:16:03,058 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
2021-02-21 12:16:03,061 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:75) - error
java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
at org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2452)
at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:2436)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1217)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210)
at com.actimize.infrastructure.config.KafkaConsumerSample$1.run(KafkaConsumerSample.java:69)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "pool-3-thread-1" java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
at org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2452)
at org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:2335)
at org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:2290)
at com.actimize.infrastructure.config.KafkaConsumerSample$1.run(KafkaConsumerSample.java:88)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-02-21 12:16:03,429 INFO [pool-4-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:70) - finished polling, got 0 records
2021-02-21 12:16:03,429 INFO [pool-4-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
Looking at the logs you've shared, two thread starting to poll almost at the same time:
2021-02-21 12:16:02,057 INFO [pool-3-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
2021-02-21 12:16:02,427 INFO [pool-4-thread-1] config.KafkaConsumerSample$1 (KafkaConsumerSample.java:68) - going to poll
There are extra measurements to be taken into consideration in order to implement multithreaded consumer.
The most important points that you may want to tackle are:
Ensure that records from the same partitions are processed only by one thread at a time
Commit offsets only after records are processed
Handle group rebalancing properly
Further reading: Kafka Consumer Multi Threaded Messaging
I have a Spring Boot App using GlobalKTable. It worked fine until the update to kafka-streams-5.5.0-css (Confluent Platform version compatible with Apache Kafka 2.5.0 ) from 5.3.2-css (
Apache Kafka 2.3.1).
So this is my configuration:
#Configuration
#EnableKafkaStreams
public class GlobalTableConfiguration {
public GlobalTableConfiguration() {
}
#Bean
public GlobalKTable<String, String> table(StreamsBuilder kStreamsBuilder) {
return kStreamsBuilder.globalTable("topic1", Consumed.with(null, null),
Materialized.as("topic1-store"));
}
}
I'm getting the store like this:
streamsBuilderFactoryBean.getKafkaStreams().
store("topic1-store", QueryableStoreTypes.keyValueStore());
this fails with:
Request processing failed; nested exception is java.lang.IllegalStateException: KafkaStreams is not running. State is ERROR.
org.springframework.web.util.NestedServletException: Request processing failed; nested exception is java.lang.IllegalStateException: KafkaStreams is not running. State is ERROR.
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
Caused by: java.lang.IllegalStateException: KafkaStreams is not running. State is ERROR.
at org.apache.kafka.streams.KafkaStreams.validateIsRunningOrRebalancing(KafkaStreams.java:316)
at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:1182)
at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:1169)
I can see in that the stream thread is shutting down before this:
2020-06-16 13:22:46.943 INFO 72423 --- [ Test worker] o.a.kafka.common.utils.AppInfoParser : Kafka version: 2.5.0
2020-06-16 13:22:46.944 INFO 72423 --- [ Test worker] o.a.kafka.common.utils.AppInfoParser : Kafka commitId: 66563e712b0b9f84
2020-06-16 13:22:46.944 INFO 72423 --- [ Test worker] o.a.kafka.common.utils.AppInfoParser : Kafka startTimeMs: 1592299366943
2020-06-16 13:22:46.946 INFO 72423 --- [ad | producer-2] org.apache.kafka.clients.Metadata : [Producer clientId=producer-2] Cluster ID: aKrIp_7wQcqF9OlSUoBgSQ
2020-06-16 13:22:47.496 INFO 72423 --- [ Test worker] org.apache.kafka.streams.KafkaStreams : stream-client [app-d09c3f52-8d77-4814-944b-ba08b79ed8a4] State transition from ERROR to PENDING_SHUTDOWN
2020-06-16 13:22:47.497 INFO 72423 --- [ms-close-thread] o.a.k.s.p.internals.StreamThread : stream-thread [app-d09c3f52-8d77-4814-944b-ba08b79ed8a4-StreamThread-1] Informed to shut down
2020-06-16 13:22:47.497 INFO 72423 --- [ms-close-thread] o.a.k.s.p.internals.GlobalStreamThread : global-stream-thread [app-d09c3f52-8d77-4814-944b-ba08b79ed8a4-GlobalStreamThread] State transition from RUNNING to PENDING_SHUTDOWN
2020-06-16 13:22:47.557 INFO 72423 --- [balStreamThread] o.a.k.s.p.internals.GlobalStreamThread : global-stream-thread [app-d09c3f52-8d77-4814-944b-ba08b79ed8a4-GlobalStreamThread] Shutting down
2020-06-16 13:22:47.571 INFO 72423 --- [balStreamThread] o.a.k.s.p.internals.GlobalStreamThread : global-stream-thread [app-d09c3f52-8d77-4814-944b-ba08b79ed8a4-GlobalStreamThread] State transition from PENDING_SHUTDOWN to DEAD
2020-06-16 13:22:47.571 INFO 72423 --- [balStreamThread] o.a.k.s.p.internals.GlobalStreamThread : global-stream-thread [app-d09c3f52-8d77-4814-944b-ba08b79ed8a4-GlobalStreamThread] Shutdown complete
After some experiments I made it work by adding to my configuration:
#Bean
public KStream kStream(StreamsBuilder kStreamsBuilder) {
return kStreamsBuilder.stream("some-topic", Consumed.with(null, null));
}
So basically when I have any KStream defined (consuming from any topic) the stream thread stays alive and everything works as before the upgrade.
My question is, what would be the correct way to do it without this useless bean (and topic).
EDIT
There was a similar issue discussed here: Kafka Streams 2.5.0 requires input topic
Looks like this will be fixed in kafka-streams 2.5.1 and util then setting num.stream.threads: 0 is nicer workaround than what declaring dummy stream.
This appears to have nothing to do with Spring and is caused by some internal changes in the kafka-streams classes.
This works fine with Boot 2.2.x (Kafka-streams 2.3.x).
#SpringBootApplication
#EnableKafkaStreams
public class So62406117Application {
public static void main(String[] args) {
SpringApplication.run(So62406117Application.class, args);
}
#Bean
public GlobalKTable<String, String> table(StreamsBuilder kStreamsBuilder) {
return kStreamsBuilder.globalTable("topic1", Consumed.with(null, null),
Materialized.as("topic1-store"));
}
#Bean
public ApplicationRunner runner(StreamsBuilderFactoryBean fb) {
return args -> {
ReadOnlyKeyValueStore<Object, Object> store =
fb.getKafkaStreams().store("topic1-store", QueryableStoreTypes.keyValueStore());
System.out.println(store);
};
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("topic1").partitions(1).replicas(1).build();
}
}
But fails with Boot 2.3 (Kafka-Streams 2.5.0).
We are definitely starting the KafkaStreams (in the factory bean start() method, but during that start() we get
java.lang.IllegalStateException: Consumer is not subscribed to any topics or assigned any partitions
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1228) ~[kafka-clients-2.5.0.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) ~[kafka-clients-2.5.0.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:853) ~[kafka-streams-2.5.0.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:753) ~[kafka-streams-2.5.0.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:697) ~[kafka-streams-2.5.0.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:670) ~[kafka-streams-2.5.0.jar:na]
2020-06-16 17:44:02.700 INFO 10635 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [foo-235af8e6-6618-4e73-86ad-75307130004b-StreamThread-1] State transition from STARTING to PENDING_SHUTDOWN
2020-06-16 17:44:02.700 INFO 10635 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [foo-235af8e6-6618-4e73-86ad-75307130004b-StreamThread-1] Shutting down
2020-06-16 17:44:02.700 INFO 10635 --- [-StreamThread-1] o.a.k.clients.consumer.KafkaConsumer : [Consumer clientId=foo-235af8e6-6618-4e73-86ad-75307130004b-StreamThread-1-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
2020-06-16 17:44:02.700 INFO 10635 --- [-StreamThread-1] o.a.k.clients.producer.KafkaProducer : [Producer clientId=foo-235af8e6-6618-4e73-86ad-75307130004b-StreamThread-1-producer] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
2020-06-16 17:44:02.704 INFO 10635 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [foo-235af8e6-6618-4e73-86ad-75307130004b-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
2020-06-16 17:44:02.704 INFO 10635 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [foo-235af8e6-6618-4e73-86ad-75307130004b] State transition from REBALANCING to ERROR
2020-06-16 17:44:02.704 ERROR 10635 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [foo-235af8e6-6618-4e73-86ad-75307130004b] All stream threads have died. The instance will be in error state and should be closed.
2020-06-16 17:44:02.704 INFO 10635 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [foo-235af8e6-6618-4e73-86ad-75307130004b-StreamThread-1] Shutdown complete
I am trying run the annotation function of graphaware within Neo4J (see documentation here). I have a set of 5000 nodes (KnowledgeArticles) with textual data in the content property. To annotate those I run the following query in Neo4J desktop:
CALL apoc.periodic.iterate(
"MATCH (n:KnowledgeArticle) RETURN n",
"CALL ga.nlp.annotate({text: n.content, id: id(n)})
YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:true})
After annotating approximately 200 to 300 KnowledgeArticles the database shuts down and provides the error:
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure `apoc.periodic.iterate`: Caused by:
java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.FutureTask#373b81ee rejected from
java.util.concurrent.ThreadPoolExecutor#285a2901[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 288]
I have experimented using different values for batchSize or setting iterateList to false, but none of this helped.
Also, I have tried performing the above iterate call limiting it only to 150 nodes. This works fine for the first time I call it, but when I perform it for a second time it again provides the same error, stating that the completed_task is about 200 to 300. The processor in the back thus seems to 'remember' the amount of tasks it has run in total as of the first time the database has started.
Could you help me resolve this issue. I want to run the above query not necessarily from Neo4j desktop, but eventually with py2neo from Python using graph.run([iterate-query]). If there is thus any way of solving this from Python, that would be even better.
Thank you!
PS. The debug log provides the following output (as of the last few iterations of the annotation up until the shut down):
2019-05-21 12:46:10.359+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] Start storing annotatedText 251906
2019-05-21 12:46:13.784+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] end storing annotatedText 251906. It took: 3425
2019-05-21 12:46:13.786+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:13.788+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:13.800+0000 INFO [c.g.n.u.ProcessorUtils] Taking default pipeline from configuration : myPipeline
2019-05-21 12:46:13.868+0000 INFO [c.g.n.p.s.StanfordTextProcessor] Time for pipeline annotation (myPipeline): 67. Text length: 954
2019-05-21 12:46:13.869+0000 INFO [c.g.n.NLPManager] Time to annotate 68
2019-05-21 12:46:13.869+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:13.869+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] Start storing annotatedText 251907
2019-05-21 12:46:15.848+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] end storing annotatedText 251907. It took: 1978
2019-05-21 12:46:15.848+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:15.862+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:15.915+0000 INFO [c.g.n.u.ProcessorUtils] Taking default pipeline from configuration : myPipeline
2019-05-21 12:46:16.294+0000 INFO [c.g.n.p.s.StanfordTextProcessor] Time for pipeline annotation (myPipeline): 378. Text length: 2641
2019-05-21 12:46:16.295+0000 INFO [c.g.n.NLPManager] Time to annotate 379
2019-05-21 12:46:16.296+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:16.296+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] Start storing annotatedText 251908
2019-05-21 12:46:16.421+0000 INFO [o.n.k.a.DatabaseAvailabilityGuard] Database graph.db is unavailable.
2019-05-21 12:46:17.018+0000 INFO [c.g.s.f.b.GraphAwareServerBootstrapper] stopped
2019-05-21 12:46:17.020+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutdown started
2019-05-21 12:46:17.149+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutting down 'graph.db' database.
2019-05-21 12:46:17.150+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutdown started
2019-05-21 12:46:17.164+0000 INFO [o.n.b.i.BackupServer] BackupServer communication server shutting down and unbinding from /127.0.0.1:6362
2019-05-21 12:46:17.226+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by database shutdown # txId: 7720 checkpoint started...
2019-05-21 12:46:17.247+0000 INFO [o.n.k.i.s.c.CountsTracker] Rotated counts store at transaction 7720 to [/Users/{my.user.name}/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-e2babea7-0332-4c2c-bf1d-076d4feed49a/installation-3.5.4/data/databases/graph.db/neostore.counts.db.a], from [/Users/{my.user.name}/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-e2babea7-0332-4c2c-bf1d-076d4feed49a/installation-3.5.4/data/databases/graph.db/neostore.counts.db.b].
2019-05-21 12:46:17.644+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by database shutdown # txId: 7720 checkpoint completed in 418ms
2019-05-21 12:46:17.647+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] No log version pruned, last checkpoint was made in version 3
2019-05-21 12:46:17.698+0000 INFO [o.n.i.d.DiagnosticsManager] --- STOPPING diagnostics START ---
2019-05-21 12:46:17.700+0000 INFO [o.n.i.d.DiagnosticsManager] --- STOPPING diagnostics END ---
2019-05-21 12:46:17.706+0000 INFO [c.g.r.BaseGraphAwareRuntime] Shutting down GraphAware Runtime...
2019-05-21 12:46:17.709+0000 INFO [c.g.r.m.BaseModuleManager] Shutting down module UIDM
2019-05-21 12:46:17.709+0000 INFO [c.g.r.m.BaseModuleManager] Shutting down module NLP
2019-05-21 12:46:17.712+0000 INFO [c.g.r.s.RotatingTaskScheduler] Terminating task scheduler...
2019-05-21 12:46:17.712+0000 INFO [c.g.r.s.RotatingTaskScheduler] Task scheduler terminated successfully.
2019-05-21 12:46:17.714+0000 INFO [c.g.r.BaseGraphAwareRuntime] GraphAware Runtime shut down.
I'm using Kafka Streams v. 0.10.2.0 for streaming between topics with a simple processing. Recently I had an issue when one of the brokers went down and kafka streams app shut down and stayed down until I manually restarted it. Trying to debug this issue I can't understand from logs what exactly caused this, here is the log excerpt:
INFO [StreamThread-1] o.a.k.c.c.i.ConsumerCoordinator - Revoking previously assigned partitions [topicname-3, topicname-1, topicname-2] for group streams-group
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] partitions [[topicname-3, topicname-1, topicname-2]] revoked at the beginning of consumer rebalance.
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Closing a task's topology 0_1
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Closing a task's topology 0_2
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Closing a task's topology 0_3
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Flushing state stores of task 0_1
INFO [kafka-coordinator-heartbeat-thread | streams-group] o.a.k.c.c.i.AbstractCoordinator - Marking the coordinator 127.0.0.1:9092 dead for group streams-group
INFO [kafka-coordinator-heartbeat-thread | streams-group] o.a.k.c.c.i.AbstractCoordinator - Discovered coordinator 127.0.0.1:9092 for group streams-group.
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Flushing state stores of task 0_2
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Flushing state stores of task 0_3
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Committing consumer offsets of task 0_1
ERROR [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Failed while executing StreamTask 0_1 due to commit consumer offsets:
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Updating suspended tasks to contain active tasks [[0_1, 0_2, 0_3]]
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Removing all active tasks [[0_1, 0_2, 0_3]]
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Removing all standby tasks [[]]
ERROR [StreamThread-1] o.a.k.c.c.i.ConsumerCoordinator - User provided listener org.apache.kafka.streams.processor.internals.StreamThread$1 for group streams-group failed on partition revocation
INFO [StreamThread-1] o.a.k.c.c.i.AbstractCoordinator - (Re-)joining group streams-group
INFO [StreamThread-1] o.a.k.c.c.i.AbstractCoordinator - Marking the coordinator dead for group streams-group
INFO [StreamThread-1] o.a.k.c.c.i.AbstractCoordinator - Discovered coordinator for group streams-group.
INFO [StreamThread-1] o.a.k.c.c.i.AbstractCoordinator - (Re-)joining group streams-group
INFO [StreamThread-1] o.a.k.s.p.i.StreamPartitionAssignor - stream-thread [StreamThread-1] Constructed client metadata ...
INFO [StreamThread-1] o.a.k.s.p.i.StreamPartitionAssignor - stream-thread [StreamThread-1] Completed validating internal topics in partition assignor
INFO [StreamThread-1] o.a.k.s.p.i.StreamPartitionAssignor - stream-thread [StreamThread-1] Completed validating internal topics in partition assignor
INFO [StreamThread-1] o.a.k.s.p.i.StreamPartitionAssignor - stream-thread [StreamThread-1] Assigned tasks to clients as {...=[activeTasks: ([0_0, 0_4]) assignedTasks: ([0_0, 0_4]) prevActiveTasks: ([]) prevAssignedTasks: ([]) capacity: 1.0 cost: 0.2], ...=[activeTasks: ([0_1, 0_2, 0_3]) assignedTasks: ([0_1, 0_2, 0_3]) prevActiveTasks: ([]) prevAssignedTasks: ([]) capacity: 1.0 cost: 0.30000000000000004]}.
INFO [StreamThread-1] o.a.k.c.c.i.AbstractCoordinator - Successfully joined group streams-group with generation 17
INFO [StreamThread-1] o.a.k.c.c.i.ConsumerCoordinator - Setting newly assigned partitions [topicname-3, topicname-1, topicname-2] for group streams-group
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] New partitions [[topicname-3, topicname-1, topicname-2]] assigned at the end of consumer rebalance.
INFO [StreamThread-1] o.a.k.s.p.i.StreamTask - task [0_1] Initializing processor nodes of the topology
INFO [StreamThread-1] o.a.k.s.p.i.StreamTask - task [0_2] Initializing processor nodes of the topology
INFO [StreamThread-1] o.a.k.s.p.i.StreamTask - task [0_3] Initializing processor nodes of the topology
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Shutting down
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Closing a task 0_1
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Closing a task 0_2
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Closing a task 0_3
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Flushing state stores of task 0_1
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Flushing state stores of task 0_2
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Flushing state stores of task 0_3
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Closing the state manager of task 0_1
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Closing the state manager of task 0_2
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Closing the state manager of task 0_3
INFO [StreamThread-1] o.a.k.c.p.KafkaProducer - Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Removing all active tasks [[0_1, 0_2, 0_3]]
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Removing all standby tasks [[]]
INFO [StreamThread-1] o.a.k.s.p.i.StreamThread - stream-thread [StreamThread-1] Stream thread shutdown complete
WARN [StreamThread-1] o.a.k.s.p.i.StreamThread - Unexpected state transition from RUNNING to NOT_RUNNING
First of all it seems very unlikely that processing was taking a long time because it is very simple and the app was running for a couple of months with no messages like that in the logs.
Also judging from the logs kafka streams successfully rejoined the group but then suddenly it just shut down without an exception. I had two streams apps running on different machines and both were shut down at the same time when broker restarted.
How do I debug this problem? Shouldn't it throw an exception at least?
Another issue is that while streams thread shut down the rest of an app was working fine so it wasn't restarted automatically. Can I catch this somehow and restart the thread? The retention policy makes it very undesirable for a consumer to go under, how can I make the kafka streams app more reliable?
Thanks!
It's hard to say from the log. Maybe DEBUG log would reveal more information...
The only "shot in the dark" might be, that there was an error during Initializing processor nodes of the topology. But if there was an exception, it should be in the log actually. It could also be a bug in the library.
About monitoring your application, you have multiple options:
you can register a KafkaStreams#setUncaughtExceptionHandler() to see if an exception bubble out if a StreamThread and thus the thread dies
you can register a KafkaStreams#setStateListener() to see if the app go into NOT_RUNNING state (btw: there is one know issue with NOT_RUNNING state in 0.10.2 and 0.11.0 -- just got fixed in trunk: if all threads are dead, the state might still be RUNNING, so you should monitor the number of threads that are still running manually)
Btw: I would recommend to upgrade to 0.10.2.1 that contains multiple important bug fixes.