Is there a "Circuit Breaker" for Spring Boot Kafka client? - java

In case that Kafka server is (temporarily) down, my Spring Boot application ReactiveKafkaConsumerTemplate keeps trying to connect unsuccessfully, thus causing unnecessary traffic and messing the log files:
2021-11-10 14:45:30.265 WARN 24984 --- [onsumer-group-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-group-1, groupId=consumer-group] Connection to node -1 (localhost/127.0.0.1:29092) could not be established. Broker may not be available.
2021-11-10 14:45:32.792 WARN 24984 --- [onsumer-group-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-group-1, groupId=consumer-group] Bootstrap broker localhost:29092 (id: -1 rack: null) disconnected
2021-11-10 14:45:34.845 WARN 24984 --- [onsumer-group-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-group-1, groupId=consumer-group] Connection to node -1 (localhost/127.0.0.1:29092) could not be established. Broker may not be available.
2021-11-10 14:45:34.845 WARN 24984 --- [onsumer-group-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-group-1, groupId=consumer-group] Bootstrap broker localhost:29092 (id: -1 rack: null) disconnected
Is it possible to use something like a circuit breaker (an inspiration here or here), so the Spring Boot Kafka client in case of a failure (or even better a few consecutive failures) slows down the pace of its connection attempts, and returns to the normal pace only after the server is up again?
Is there already a ready-made config parameter, or any other solution?
I am aware of the parameter reconnect.backoff.ms, this is how I create the ReactiveKafkaConsumerTemplate bean:
#Bean
public ReactiveKafkaConsumerTemplate<String, MyEvent> kafkaConsumer(KafkaProperties properties) {
final Map<String, Object> map = new HashMap<>(properties.buildConsumerProperties());
map.put(ConsumerConfig.GROUP_ID_CONFIG, "MyGroup");
map.put(ConsumerConfig.RECONNECT_BACKOFF_MS_CONFIG, 10_000L);
final JsonDeserializer<DisplayCurrencyEvent> jsonDeserializer = new JsonDeserializer<>();
jsonDeserializer.addTrustedPackages("com.example.myapplication");
return new ReactiveKafkaConsumerTemplate<>(
ReceiverOptions
.<String, MyEvent>create(map)
.withKeyDeserializer(new ErrorHandlingDeserializer<>(new StringDeserializer()))
.withValueDeserializer(new ErrorHandlingDeserializer<>(jsonDeserializer))
.subscription(List.of("MyTopic")));
}
And still the consumer is trying to connect every 3 seconds.

See https://kafka.apache.org/documentation/#consumerconfigs_retry.backoff.ms
The base amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all connection attempts by the client to a broker.
and https://kafka.apache.org/documentation/#consumerconfigs_reconnect.backoff.max.ms
The maximum amount of time in milliseconds to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the backoff per host will increase exponentially for each consecutive connection failure, up to this maximum. After calculating the backoff increase, 20% random jitter is added to avoid connection storms.
and

Related

Kafka template connection with broker

I've got a #KafkaListener method in my service which processes message and send it to another topic using KafkaTemplate and from time to time it completely stops working due to some reasons.
2022-10-04 16:53:18.218 ERROR 1 --- [pool-1-thread-2] o.s.k.support.LoggingProducerListener : Exception thrown when sending a message with key='null' and payload='{"type":"INFORMATION","messageId":"f39fabfd-e560-499b-9850-440ad811657b","phoneNumber":"+100000000...' to topic ss.fb.processing-notifications.send:
org.apache.kafka.common.errors.TimeoutException: Topic ss.fb.processing-notifications.send not present in metadata after 60000 ms.
2022-10-04 16:53:33.013 INFO 1 --- [ad | producer-1] org.apache.kafka.clients.NetworkClient : [Producer clientId=producer-1] Disconnecting from node -2 due to socket connection setup timeout. The timeout value is 29794 ms.
2022-10-04 16:53:33.014 WARN 1 --- [ad | producer-1] org.apache.kafka.clients.NetworkClient : [Producer clientId=producer-1] Bootstrap broker prd-mqueue-srv2.obi.ru:9092 (id: -2 rack: null) disconnected
2022-10-04 16:53:41.005 INFO 1 --- [ntainer#0-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-nepcNotificationsGroup-1, groupId=nepcNotificationsGroup] Disconnecting from node -3 due to socket connection setup timeout. The timeout value is 27831 ms.
2022-10-04 16:53:41.005 WARN 1 --- [ntainer#0-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-nepcNotificationsGroup-1, groupId=nepcNotificationsGroup] Bootstrap broker prd-mqueue-srv3.obi.ru:9092 (id: -3 rack: null) disconnected
There seems to be some network issues however after restarting the service everything works fine again. Anyway I wonder why eventually broker turns out to be disconnected? Is't producer supposed to infinitely try sending message to broker until it succedes?
You can wrap the KafkaTemplate call in a RetryTemplate or #Retryable method - see https://github.com/spring-projects/spring-retry - the RetryTemplate is already on the class path as a transitive dependency of spring-kafka.

Unable to connect Spring Boot application to Kafka messages having CDC events

I am having trouble in connecting my basic Spring Boot consumer application with the Apache Kafka that contains CDC events from Debezium. My consumer application is running separately on my host, while Kafka in running on the docker with following setup
kafka:
image: quay.io/debezium/kafka
ports:
- 29092:29092
- 19092:19092
links:
- zookeeper
environment:
- ZOOKEEPER_CONNECT=zookeeper:2181
- LOG_LEVEL=DEBUG
- LISTENERS=PLAINTEXT://:9092, CONNECTIONS_FROM_HOST://:29092, CONNECTIONS_FROM_OUTSIDE://:19092
- ADVERTISED_LISTENERS= PLAINTEXT://kafka:9092, CONNECTIONS_FROM_HOST://localhost:29092, CONNECTIONS_FROM_OUTSIDE://{ec2 instance ip address}:19092
- LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT, CONNECTIONS_FROM_HOST:PLAINTEXT, CONNECTIONS_FROM_OUTSIDE:PLAINTEXT
restart: on-failure
I already have a topic in my Kafka broker with the name "product_sb" that contains the CDC events from debezium that looks like this:
Key (166 bytes): {"schema":{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"}],"optional":false,"name":"dbserver1.inventory.product_sb.Key"},"payload":{"id":1}}
Value (2443 bytes): {"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"},{"type":"string","optional":true,"field":"name"},{"type":"double","optional":false,"field":"price"},{"type":"int32","optional":false,"field":"quantity"}],"optional":true,"name":"dbserver1.inventory.product_sb.Value","field":"before"},{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"},{"type":"string","optional":true,"field":"name"},{"type":"double","optional":false,"field":"price"},{"type":"int32","optional":false,"field":"quantity"}],"optional":true,"name":"dbserver1.inventory.product_sb.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false,incremental"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":true,"field":"sequence"},{"type":"string","optional":true,"field":"table"},{"type":"int64","optional":false,"field":"server_id"},{"type":"string","optional":true,"field":"gtid"},{"type":"string","optional":false,"field":"file"},{"type":"int64","optional":false,"field":"pos"},{"type":"int32","optional":false,"field":"row"},{"type":"int64","optional":true,"field":"thread"},{"type":"string","optional":true,"field":"query"}],"optional":false,"name":"io.debezium.connector.mysql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"id"},{"type":"int64","optional":false,"field":"total_order"},{"type":"int64","optional":false,"field":"data_collection_order"}],"optional":true,"field":"transaction"}],"optional":false,"name":"dbserver1.inventory.product_sb.Envelope"},"payload":{"before":null,"after":{"id":1,"name":"Laptop","price":170000.0,"quantity":1},"source":{"version":"1.9.5.Final","connector":"mysql","name":"dbserver1","ts_ms":1664161694254,"snapshot":"true","db":"inventory","sequence":null,"table":"product_sb","server_id":0,"gtid":null,"file":"mysql-bin.000002","pos":1684,"row":0,"thread":null,"query":null},"op":"r","ts_ms":1664161694258,"transaction":null}}
Partitio
n: 0 Offset: 0
The following is the Spring Boot application of consumer that is trying to connect to the Kafka
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Service;
#Service
public class KafkaConsumer {
private static final Logger LOGGER = LoggerFactory.getLogger(KafkaConsumer.class);
#KafkaListener(topics = "product_sb")
public void consume(String message){
LOGGER.info(String.format("Message received --> %s", message));
}
}
The following is the application.properties file for the consumer
#CONFIGURING CONSUMER
#Configuring address of the kafka server
spring.kafka.consumer.bootstrap-servers: localhost:29092
#Configuring consumer group
spring.kafka.consumer.group-id: mygroup
#Configure offset for consumer
spring.kafka.consumer.auto-offset-reset: earliest
#Key Value deserializer
spring.kafka.consumer.key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.consumer.value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
While running the Spring Boot application, I am getting the following error:
org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-mygroup-1, groupId=mygroup] Node -1 disconnected.
org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-mygroup-1, groupId=mygroup] Cancelled in-flight API_VERSIONS request with correlation id 1 due to node -1 being disconnected (elapsed time since creation: 31ms, elapsed time since send: 31ms, request timeout: 30000ms)
org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-mygroup-1, groupId=mygroup] Bootstrap broker localhost:29092 (id: -1 rack: null) disconnected
I have followed many articles but couldn't get my application working. I initially tried to connect my Spring Boot running on my host to the Kafka running on the ec2 instance but had the same issue.
Can anyone suggest solution for this?
It would also be useful if someone can help me with working on serialization and deserialization of event from debezium that is stored in the Kafka topic. The following are the configurations of the source connector I have used:
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "proxysql",
"database.port": "6033",
"database.user": "root",
"database.password": "root",
"database.server.id": "184054",
"database.server.name": "dbserver1",
"database.include.list": "inventory",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$3",
"gtid.new.channel.position": "earliest"
}
}

The kafka script kafka-consumer-group.sh throws timed out waiting for a node assignment

I have a Kafka cluster running with 3 brokers.
<node-ip>:10092 //broker-0
<node-ip>:10093 //broker-1
<node-ip>:10094 //broker-2
The broker-1 <node-ip>:10093 is in a not-ready state(due to some readiness failure). But other 2 brokers are running fine.
But when I use the script kafka-consumer-groups.sh with a running broker address as bootstrap-server, I get the following error.
kafka#mirror-maker-0:/opt/kafka/bin$ /opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server <node-ip>:10094 --describe --group c2-c1-consumer-group --state
[2022-03-14 10:24:16,008] WARN [AdminClient clientId=adminclient-1] Connection to node 1 (/<node-ip>:10093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2022-03-14 10:24:17,086] WARN [AdminClient clientId=adminclient-1] Connection to node 1 (/<node-ip>:10093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2022-03-14 10:24:18,206] WARN [AdminClient clientId=adminclient-1] Connection to node 1 (/<node-ip>:10093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2022-03-14 10:24:19,458] WARN [AdminClient clientId=adminclient-1] Connection to node 1 (/<node-ip>:10093) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Error: Executing consumer group command failed due to org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeGroups(api=DESCRIBE_GROUPS)
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeGroups(api=DESCRIBE_GROUPS)
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$describeConsumerGroups$1(ConsumerGroupCommand.scala:543)
at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
at scala.collection.convert.JavaCollectionWrappers$AbstractJMapWrapper.map(JavaCollectionWrappers.scala:309)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.describeConsumerGroups(ConsumerGroupCommand.scala:542)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.collectGroupsState(ConsumerGroupCommand.scala:620)
at kafka.admin.ConsumerGroupCommand$ConsumerGroupService.describeGroups(ConsumerGroupCommand.scala:373)
at kafka.admin.ConsumerGroupCommand$.run(ConsumerGroupCommand.scala:72)
at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:59)
at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeGroups(api=DESCRIBE_GROUPS)
Could someone please help me to understand
Why is it connecting to the non mentioned broker(log shows 10093 but I passed :10094)?
Is there any solution to use only the mentioned bootstrap-servers?
One more thing is,
When I run kafka-topics.sh with the running broker address as bootstrap-server, it returns the response.
Thanks
I faced a similar issue. I was able to read the topics but I cannot list, describe the groups. I solved the issue by adding a large timeout. Can you please also try putting a large timeout?
./kafka-consumer-groups.sh --command-config kafka.properties --bootstrap-server brokers --group group --describe --timeout 100000

kafka consumer try to connect to random hostname instead right one

I'm new to Kafka and started exploring with sample program. It used to work without any issue but all of sudden consumer.poll() command hangs and never returns. Googling suggested to check the servers are accessible. Producer and Consumer java code runs in same machine, where producer able to post record to Kafka, but consumer poll method hangs.
Environment:
Kafka version: 1.1.0
Client: Java
Runs in Ubuntu docker container inside windows
Zookeeper and 2 Broker servers runs in same container
When I have enabled logging for client code, I see below exception:
2018-07-06 21:24:18 DEBUG NetworkClient:802 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Error connecting to node 4bdce773eb74:9095 (id: 2 rack: null)
java.io.IOException: Can't resolve address: 4bdce773eb74:9095
at org.apache.kafka.common.network.Selector.doConnect(Selector.java:235)
at org.apache.kafka.common.network.Selector.connect(Selector.java:214)
.................
.................
I'm not sure why consumer trying to connect to 4bdce773eb74 even though my broker servers are 192.168.99.100:9094,192.168.99.100:9095. And my full consumer code:
final String BOOTSTRAP_SERVERS = "192.168.99.100:9094,192.168.99.100:9095";
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ConsumerConfig.GROUP_ID_CONFIG, "Event_Consumer");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
KafkaConsumer<Long, String> consumer = new KafkaConsumer<Long, String>(props);
TopicPartition tpLogin = new TopicPartition("login1", 0);
TopicPartition tpLogout = new TopicPartition("logout1", 1);
List<TopicPartition> tps = Arrays.asList(tpLogin, tpLogout);
consumer.assign(tps);
while (true) {
final ConsumerRecords<Long, String> consumerRecords = consumer.poll(1000);
if (consumerRecords.count()==0) {
continue;
}
consumerRecords.forEach(record -> {
System.out.printf("Consumer Record:(%d, %s, %d, %d)\n", record.key(), record.value(),
record.partition(), record.offset());
});
consumer.commitAsync();
Thread.sleep(5000);
}
}
Please help in this issue.
EDIT
As I said earlier, I have 2 brokers, say broker-1 and broker-2. If I stop broker-1, then above exception is not logged, but still poll() method didn't returns.
Below message logged indefinitely, if I stop broker-1:
2018-07-07 11:31:24 DEBUG AbstractCoordinator:579 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Sending FindCoordinator request to broker 192.168.99.100:9094 (id: 1 rack: null)
2018-07-07 11:31:24 DEBUG AbstractCoordinator:590 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Received FindCoordinator response ClientResponse(receivedTimeMs=1530943284196, latencyMs=2, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, clientId=consumer-1, correlationId=573), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null)))
2018-07-07 11:31:24 DEBUG AbstractCoordinator:613 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Group coordinator lookup failed: The coordinator is not available.
2018-07-07 11:31:24 DEBUG AbstractCoordinator:227 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Coordinator discovery failed, refreshing metadata
Thanks in Advance,
Soman
I found the issue. When I'm creating topic, broker-0(runs on port:9093; broker id:0) and broker-2(runs on port:9094; broker id:2) was running. Today I have mistakenly started broker-1(runs on port:9095; broker id:1) and broker-2. After stopping broker-1 and starting broker-0, resolves the issue. Now consumer able to get the events.
Definitely human error from my side, but I have 2 comments:
I think Kafka should gracefully use broker-2(port no:9094) and ignore broker-1(port no:9095)
why Kafka trying to contact 4bdce773eb74:9095, instead of right IP address(192.168.99.100)?
thanks.

AMQP Channel shutdown but Consumer not always restart

I have a frequent Channel shutdown: connection error issues (under 24.133.241:5671 thread, name is truncated) in RabbitMQ Java client (my producer and consumer are far apart). Most of the time consumer is automatically restarted as I have enabled heartbeat (15 seconds). However, there were some instances only Channel shutdown: connection error but no Consumer raised exception and no Restarting Consumer (under cTaskExecutor-4 thread).
My current workaround is to restart my application. Anyone can shed some light on this matter?
2017-03-20 12:42:38.856 ERROR 24245 --- [24.133.241:5671] o.s.a.r.c.CachingConnectionFactory
: Channel shutdown: connection error
2017-03-20 12:42:39.642 WARN 24245 --- [cTaskExecutor-4] o.s.a.r.l.SimpleMessageListenerCont
ainer : Consumer raised exception, processing can restart if the connection factory supports
it
...
2017-03-20 12:42:39.642 INFO 24245 --- [cTaskExecutor-4] o.s.a.r.l.SimpleMessageListenerCont
ainer : Restarting Consumer: tags=[{amq.ctag-4CqrRsUP8plDpLQdNcOjDw=21-05060179}], channel=Ca
ched Rabbit Channel: AMQChannel(amqp://21-05060179#10.24.133.241:5671/,1), conn: Proxy#7ec317
54 Shared Rabbit Connection: SimpleConnection#44bac9ec [delegate=amqp://21-05060179#10.24.133
.241:5671/], acknowledgeMode=NONE local queue size=0
Generally, this is due to the consumer thread being "stuck" in user code somewhere, so it can't react to the broken connection.
If you have network issues, perhaps it's stuck reading or writing to a socket; make sure you have timeouts set for any I/O operations.
Next time it happens take a thread dump to see what the consumer threads are doing.

Categories