Spring-Kafka consumer doesn't receive messages - java

I don't know whats going on that my java client consumer annotated with #KafkaListener doesn't receives any messages. When I create consumer via command line it works. Also Producer works as expected (also in java). Could someone help me to understand that behavior?
application.yml
kafka:
bootstrap-servers: localhost:9092
topic: my-topic
producer config:
#Configuration
public class KafkaProducerConfig {
#Value("${kafka.bootstrap-servers}")
private String bootstrapServers;
#Bean
public ProducerFactory<String, String> producerFactory(){
Map<String, Object> configProps = new HashMap<>();
configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
return new DefaultKafkaProducerFactory<>(configProps);
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate(){
return new KafkaTemplate<>(producerFactory());
}
}
consumer config:
#EnableKafka
#Configuration
class KafkaConsumerConfig {
#Value("${kafka.bootstrap-servers}")
String bootstrapServers;
#Bean
public ConsumerFactory<String, String> consumerFactory(){
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
return new DefaultKafkaConsumerFactory<>(props);
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory(){
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
return factory;
}
}
Producer & Consumer:
#Service
class Producer {
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
#Value("${kafka.topic}")
String kafkaTopic;
public void send(String payload){
System.out.println("sending " + payload + " to " + kafkaTopic);
kafkaTemplate.send(kafkaTopic, payload);
}
}
#Service
public class Consumer {
#KafkaListener(topics = "${kafka.topic}")
public void receive(String payload){
System.out.println(payload + " aaaaaaaaaaaaaaaaaaaaaaaaaaa");
}
}
Spring controller:
#RestController
#RequestMapping(value = "/kafka")
class WebRestController {
#Autowired
Producer producer;
#GetMapping(value = "/producer")
public String producer(String data){
producer.send(data);
return "Done";
}
}
This is my console output, as you see It sends a message but method doesn't receive anything. It works if I'm not using spring-kafka, just pure kafka-api. It also works, when I bind consumer in command line - i see messages sent by java-code-producer.
2018-04-03 13:43:41.688 INFO 8068 --- [ main] o.a.k.clients.consumer.ConsumerConfig : ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [localhost:9092]
check.crcs = true
client.id =
connections.max.idle.ms = 540000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id =
heartbeat.interval.ms = 3000
interceptor.classes = null
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
2018-04-03 13:43:41.743 INFO 8068 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka version : 0.11.0.0
2018-04-03 13:43:41.743 INFO 8068 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : cb8625948210849f
2018-04-03 13:43:41.774 INFO 8068 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path ''
2018-04-03 13:43:41.777 INFO 8068 --- [ main] kafka.KafkaExample : Started KafkaExample in 3.653 seconds (JVM running for 4.195)
2018-04-03 13:43:47.245 INFO 8068 --- [nio-8080-exec-3] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring FrameworkServlet 'dispatcherServlet'
2018-04-03 13:43:47.245 INFO 8068 --- [nio-8080-exec-3] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization started
2018-04-03 13:43:47.264 INFO 8068 --- [nio-8080-exec-3] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization completed in 19 ms
sending Hello to my-topic
2018-04-03 13:43:47.300 INFO 8068 --- [nio-8080-exec-3] o.a.k.clients.producer.ProducerConfig : ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [localhost:9092]
buffer.memory = 33554432
client.id =
compression.type = none
connections.max.idle.ms = 540000
enable.idempotence = false
interceptor.classes = null
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.StringSerializer
2018-04-03 13:43:47.315 INFO 8068 --- [nio-8080-exec-3] o.a.kafka.common.utils.AppInfoParser : Kafka version : 0.11.0.0
2018-04-03 13:43:47.315 INFO 8068 --- [nio-8080-exec-3] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : cb8625948210849f
EDIT:
kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 13 --topic my-topic
This is my server.properties files:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3
# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/tmp/kafka-logs
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=localhost:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

group.id =
You need a group.id for the consumer.
Set it in the consumer factory properties.
BTW, when using boot, you don't need a consumer factory bean or container factory bean, you can use boot properties for that.
Logging can be enabled with logging.level... in the properties/yaml.

Related

Kafka Spring boot automatically starting idempotent producer in the consumer application

We are experiencing the issue in dev environment which was working well previously. In the local environment, the application is running without starting an idempotent producer and that is the expected behavior here.
Issue: Sometimes an Idempotent producer is starting to run
automatically when compiling the spring boot application. And hence
the consumer fails to consume the message produced by the actual
producer.
Adding a snippet of relevant log info:
2022-07-05 15:17:54.449 WARN 7 --- [ntainer#0-0-C-1] o.s.k.l.DeadLetterPublishingRecoverer : Destination resolver returned non-existent partition consumer-topic.DLT-1, KafkaProducer will determine partition to use for this topic
2022-07-05 15:17:54.853 INFO 7 --- [ntainer#0-0-C-1] o.a.k.clients.producer.ProducerConfig : ProducerConfig values:
acks = -1
batch.size = 16384
bootstrap.servers = [kafka server urls]
buffer.memory = 33554432
.
.
.
2022-07-05 15:17:55.047 INFO 7 --- [ntainer#0-0-C-1] o.a.k.clients.producer.KafkaProducer : [Producer clientId=producer-1] Instantiated an idempotent producer.
2022-07-05 15:17:55.347 INFO 7 --- [ntainer#0-0-C-1] o.a.kafka.common.utils.AppInfoParser : Kafka version: 3.0.1
2022-07-05 15:17:55.348 INFO 7 --- [ntainer#0-0-C-1] o.a.kafka.common.utils.AppInfoParser : Kafka commitId:
2022-07-05 15:17:57.162 INFO 7 --- [ad | producer-1] org.apache.kafka.clients.Metadata : [Producer clientId=producer-1] Cluster ID: XFGlM9HVScGD-PafRlFH7g
2022-07-05 15:17:57.169 INFO 7 --- [ad | producer-1] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-1] ProducerId set to 6013 with epoch 0
2022-07-05 15:18:56.681 ERROR 7 --- [ntainer#0-0-C-1] o.s.k.support.LoggingProducerListener : Exception thrown when sending a message with key='null' and payload='byte[63]' to topic consumer-topic.DLT:
org.apache.kafka.common.errors.TimeoutException: Topic consumer-topic.DLT not present in metadata after 60000 ms.
2022-07-05 15:18:56.748 ERROR 7 --- [ntainer#0-0-C-1] o.s.k.l.DeadLetterPublishingRecoverer : Dead-letter publication to consumer-topic.DLTfailed for: consumer-topic-1#28
org.springframework.kafka.KafkaException: Send failed; nested exception is org.apache.kafka.common.errors.TimeoutException: Topic consumer-topic.DLT not present in metadata after 60000 ms.
at org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:660) ~[spring-kafka-2.8.5.jar!/:2.8.5]
.
.
.
2022-07-05 15:18:56.751 ERROR 7 --- [ntainer#0-0-C-1] o.s.kafka.listener.DefaultErrorHandler : Recovery of record (consumer-topic-1#28) failed
2022-07-05 15:18:56.758 INFO 7 --- [ntainer#0-0-C-1] o.a.k.clients.consumer.KafkaConsumer : [Consumer clientId=consumer-consumer-topic-group-test-1, groupId=consumer-topic-group-test] Seeking to offset 28 for partition c-1
2022-07-05 15:18:56.761 ERROR 7 --- [ntainer#0-0-C-1] o.s.k.l.KafkaMessageListenerContainer : Error handler threw an exception
org.springframework.kafka.KafkaException: Seek to current after exception; nested exception is org.springframework.kafka.listener.ListenerExecutionFailedException: Listener failed; nested exception is org.springframework.kafka.support.serializer.DeserializationException: failed to deserialize; nested exception is java.lang.IllegalStateException: No type information in headers and no default type provided
As we can see on the logs that the application has started idempotent producer automatically and after starting it started throwing some errors.
Context: We have two microservices, one microservice publish the messages and contains the producer config. Second microservice only consuming the messages and does not contains any producer cofig.
The yml configurations for producer application:
kafka:
bootstrap-servers: "kafka urls"
properties:
security.protocol: SASL_SSL
sasl.mechanism: SCRAM-SHA-512
sasl.jaas.config: "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"${KAFKA_USER}\" password=\"${KAFKA_PWD}\";"
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
properties:
spring.json.trusted.packages: "*"
acks: 1
YML configuration for consumer application:
kafka:
bootstrap-servers: "kafka URL, kafka url2"
properties:
security.protocol: SASL_SSL
sasl.mechanism: SCRAM-SHA-512
sasl.jaas.config: "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"${KAFKA_USER}\" password=\"${KAFKA_PWD}\";"
consumer:
enable-auto-commit: true
auto-offset-reset: latest
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.springframework.kafka.support.serializer.ErrorHandlingDeserializer
properties:
spring.deserializer.value.delegate.class: org.springframework.kafka.support.serializer.JsonDeserializer
spring.json.trusted.packages: "*"
consumer-topic:
topic: consumer-topic
group-abc:
group-id: consumer-topic-group-abc
The kafka bean for default error handler
#Bean
public CommonErrorHandler errorHandler(KafkaOperations<Object, Object> kafkaOperations) {
return new DefaultErrorHandler(new DeadLetterPublishingRecoverer(kafkaOperations));
}
We know a temporary fix, if we delete the group id and recreate it then the application works successfully. But after some deployment, this issue is raising back and we don't know the root cause for it.
Please guide.

Read data from Kafka using Batch not work correctly using SpringBoot

I using SpringBoot and want read data from Kafka using batch. My application.yml look like this:
spring:
kafka:
bootstrap-servers:
- localhost:9092
properties:
schema.registry.url: http://localhost:8081
consumer:
auto-offset-reset: earliest
max-poll-records: 50000
enable-auto-commit: true
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: io.confluent.kafka.serializers.KafkaAvroDeserializer
group-id: "batch"
properties:
fetch.min.bytes: 1000000
fetch.max.wait.ms: 20000
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: io.confluent.kafka.serializers.KafkaAvroSerializer
listener:
type: batch
My listener:
#KafkaListener(id = "bar2", topics = "TestTopic")
public void listen(List<ConsumerRecord<String, GenericRecord>> records) {
log.info("start of batch receive. Size::{}", records.size());
}
In log I see:
2019-10-04 11:08:19.693 INFO 2123 --- [ bar2-0-C-1] kafka.batch.demo.DemoApplication : start of batch receive. Size::33279
2019-10-04 11:08:19.746 INFO 2123 --- [ bar2-0-C-1] kafka.batch.demo.DemoApplication : start of batch receive. Size::33353
2019-10-04 11:08:19.784 INFO 2123 --- [ bar2-0-C-1] kafka.batch.demo.DemoApplication : start of batch receive. Size::33400
2019-10-04 11:08:19.821 INFO 2123 --- [ bar2-0-C-1] kafka.batch.demo.DemoApplication : start of batch receive. Size::33556
2019-10-04 11:08:39.859 INFO 2123 --- [ bar2-0-C-1] kafka.batch.demo.DemoApplication : start of batch receive. Size::16412
I set the required, settings: fetch.min.bytes and fetch.max.wait.ms, but they do not give any effect.
In a log I see that a pack in the size no more than 33 thousand at any settings. I broke my mind and I don't understand why is this happening?
max.poll.records is simply a maximum.
There are other properties that influence how many records you get
fetch.min.bytes - The minimum amount of data the server should return for a fetch request. If insufficient data is available the request will wait for that much data to accumulate before answering the request. The default setting of 1 byte means that fetch requests are answered as soon as a single byte of data is available or the fetch request times out waiting for data to arrive. Setting this to something greater than 1 will cause the server to wait for larger amounts of data to accumulate which can improve server throughput a bit at the cost of some additional latency.
fetch.max.wait.ms- The maximum amount of time the server will block before answering the fetch request if there isn't sufficient data to immediately satisfy the requirement given by fetch.min.bytes.
See the documentation.
There is no way to exactly control the minimum number of records (unless they are all identical in length).

ksql-server refuses to boot up

I am facing the following problem using Confluent Open Source platform, version 4.1.0:
[2018-05-01 03:43:33,433] ERROR Failed to initialize TopicClient: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. (io.confluent.ksql.util.KafkaTopicClient:257)
Exception in thread "main" io.confluent.ksql.util.KsqlException: Could not fetch broker information. KSQL cannot initialize AdminClient.
at io.confluent.ksql.util.KafkaTopicClientImpl.init(KafkaTopicClientImpl.java:258)
at io.confluent.ksql.util.KafkaTopicClientImpl.<init>(KafkaTopicClientImpl.java:62)
at io.confluent.ksql.rest.server.KsqlRestApplication.buildApplication(KsqlRestApplication.java:237)
at io.confluent.ksql.rest.server.KsqlServerMain.createExecutable(KsqlServerMain.java:58)
at io.confluent.ksql.rest.server.KsqlServerMain.main(KsqlServerMain.java:39)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
Changing the listener port didn't help. How do we fix this?
EDIT1: I am starting the kafka brokers and ksql-server using
confluent start
Initially, "confluent status" shows that the ksql-server is UP, but the server goes down after the above timeout.
EDIT2: Yes, my kafka broker is running and here is my kafka server.properties:
broker.id=100
listeners=PLAINTEXT://localhost:19090
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs-100
num.partitions=3
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=6000
confluent.support.customer.id=anonymous
group.initial.rebalance.delay.ms=0
and ksql-server.properties:
bootstrap.servers=localhost:19090
listeners=http://localhost:18088
ksql.server.ui.enabled=true
EDIT3: My suspicion is this has got something to do with incorrect bootstrap server url, but I am not able to find that yet.
EDIT4: KSQL server logs, as requested.
[2018-05-17 03:41:33,244] INFO KsqlRestConfig values:
metric.reporters = []
ssl.client.auth = false
ksql.server.install.dir = /home/<user name>/confluent/confluent-4.1.0
response.mediatype.default = application/json
authentication.realm =
ssl.keystore.type = JKS
ssl.trustmanager.algorithm =
authentication.method = NONE
metrics.jmx.prefix = rest-utils
request.logger.name = io.confluent.rest-utils.requests
ssl.key.password = [hidden]
ssl.truststore.password = [hidden]
authentication.roles = [*]
metrics.num.samples = 2
ssl.endpoint.identification.algorithm =
compression.enable = false
query.stream.disconnect.check = 1000
ssl.protocol = TLS
debug = false
listeners = [http://localhost:18088]
ssl.provider =
ssl.enabled.protocols = []
shutdown.graceful.ms = 1000
ssl.keystore.location =
response.mediatype.preferred = [application/json]
ssl.cipher.suites = []
authentication.skip.paths = []
ssl.truststore.type = JKS
access.control.allow.methods =
access.control.allow.origin =
ssl.truststore.location =
ksql.server.command.response.timeout.ms = 5000
ssl.keystore.password = [hidden]
ssl.keymanager.algorithm =
port = 8080
metrics.sample.window.ms = 30000
metrics.tag.map = {}
ksql.server.ui.enabled = true
(io.confluent.ksql.rest.server.KsqlRestConfig:179)
[2018-05-17 03:41:33,302] INFO KsqlConfig values:
ksql.persistent.prefix = query_
ksql.schema.registry.url = http://localhost:8081
ksql.service.id = default_
ksql.sink.partitions = 4
ksql.sink.replicas = 1
ksql.sink.window.change.log.additional.retention = 1000000
ksql.statestore.suffix = _ksql_statestore
ksql.transient.prefix = transient_
(io.confluent.ksql.util.KsqlConfig:279)
[2018-05-17 03:43:33,433] ERROR Failed to initialize TopicClient: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. (io.confluent.ksql.util.KafkaTopicClient:257)
Exception in thread "main" io.confluent.ksql.util.KsqlException: Could not fetch broker information. KSQL cannot initialize AdminClient.
at io.confluent.ksql.util.KafkaTopicClientImpl.init(KafkaTopicClientImpl.java:258)
at io.confluent.ksql.util.KafkaTopicClientImpl.<init>(KafkaTopicClientImpl.java:62)
at io.confluent.ksql.rest.server.KsqlRestApplication.buildApplication(KsqlRestApplication.java:237)
at io.confluent.ksql.rest.server.KsqlServerMain.createExecutable(KsqlServerMain.java:58)
at io.confluent.ksql.rest.server.KsqlServerMain.main(KsqlServerMain.java:39)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:258)
at io.confluent.ksql.util.KafkaTopicClientImpl.init(KafkaTopicClientImpl.java:230)
... 4 more
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
I had this same exact problem on Confluent platform 5.4.1 and 5.3.1. I'm running MacOs 10.14.6. It turned out another application had taken port 8081 and therefore schema-registry was not able to bind it. I configured schema-registry to use port 8881 and reconfigured the schema-registry port on ksql-server configuration to that same value. This solved the problem
Therefore I would suggest you check that schema-registry is able to bind the configured port and that ksql-server is able to connect to that same port.

How to make Kafka broker failover to work regarding consumers?

It seems very complicated to make a replicated broker work regarding consumers: it seems when stopping certain brokers, some consumers don't work anymore and, when the specific broker is up again, those consumers that didn't work receive all the "missing" messages.
I am using a 2 brokers scenario. Created a replicated topic like this:
$KAFKA_HOME/bin/kafka-topics.sh --create \
--zookeeper localhost:2181 \
--replication-factor 2 \
--partitions 3 \
--topic replicated_topic
The excerpt from the server config looks like this ( notice it is the same for both servers except port, broker id and log dir):
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3
# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/tmp/kafka-logs0
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=localhost:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
Let's decribe my topic using 2 brokers:
Topic:replicated_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: replicated_topic Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0
Topic: replicated_topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 1,0
Topic: replicated_topic Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0
Let's see the code for the consumer:
Consumer ( impl Callable )
#Override
public Void call() throws Exception {
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
bootstrapServers);
props.put(ConsumerConfig.GROUP_ID_CONFIG,
groupId);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
IntegerDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
final Consumer<Integer, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(topicName));
ConsumerRecords<Integer, String> records = null;
while (!Thread.currentThread().isInterrupted()) {
records = consumer.poll(1000);
if (records.isEmpty()) {
continue;
}
records.forEach(rec -> LOGGER.debug("{}#{} consumed from topic {}, partition {} pair ({},{})",
ConsumerCallable.class.getSimpleName(), hashCode(), rec.topic(), rec.partition(), rec.key(), rec.value()));
consumer.commitAsync();
}
consumer.close();
return null;
}
Producer and main code:
private static final String TOPIC_NAME = "replicated_topic";
private static final String BOOTSTRAP_SERVERS = "localhost:9092, localhost:9093";
private static final Logger LOGGER = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) {
ExecutorService executor = Executors.newCachedThreadPool();
executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group1"));
executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group2"));
executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group3"));
try (Producer<Integer, String> producer = createProducer()) {
Scanner scanner = new Scanner(System.in);
String line = null;
LOGGER.debug("Please enter 'k v' on the command line to send to Kafka or 'quit' to exit");
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.trim().toLowerCase().equals("quit")) {
break;
}
String[] elements = line.split(" ");
Integer key = Integer.parseInt(elements[0]);
String value = elements[1];
producer.send(new ProducerRecord<>(TOPIC_NAME, key, value));
producer.flush();
}
}
executor.shutdownNow();
}
private static Producer<Integer, String> createProducer() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
BOOTSTRAP_SERVERS);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "KafkaExampleProducer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
IntegerSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
StringSerializer.class.getName());
return new KafkaProducer<>(props);
}
Now let's see the behaviour:
All brokers are up:
Output of kafka topic:
Topic:replicated_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: replicated_topic Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0
Topic: replicated_topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 1,0
Topic: replicated_topic Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0
Output of program:
12:52:30.460 DEBUG Main - Please enter 'k v' on the command line to send to Kafka or 'quit' to exit
1 u
12:52:35.555 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 0 pair (1,u)
12:52:35.559 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 0 pair (1,u)
12:52:35.559 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 0 pair (1,u)
2 d
12:52:38.096 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 2 pair (2,d)
12:52:38.098 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 2 pair (2,d)
12:52:38.100 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 2 pair (2,d)
Since the consumers are in different groups all messages are broadcasted to them, everything is ok.
2 Bring down broker 2:
Describe topic:
Topic:replicated_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: replicated_topic Partition: 0 Leader: 0 Replicas: 1,0 Isr: 0
Topic: replicated_topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0
Topic: replicated_topic Partition: 2 Leader: 0 Replicas: 1,0 Isr: 0
Output of program:
3 t
12:57:03.898 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 1 pair (3,t)
4 p
12:57:06.058 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 1 pair (4,p)
Now only 1 consumer receives data. Let's bring up broker 2 again:
Now the other 2 consumers receive the missing data:
12:57:50.863 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 1 pair (3,t)
12:57:50.863 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 1 pair (4,p)
12:57:50.870 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 1 pair (3,t)
12:57:50.870 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 1 pair (4,p)
Bring down broker 1:
Now only 2 consumers receive data:
5 c
12:59:13.718 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 2 pair (5,c)
12:59:13.737 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 2 pair (5,c)
6 s
12:59:16.437 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 2 pair (6,s)
12:59:16.438 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 2 pair (6,s)
If I bring it on the other consumer wil also receive missing data.
My point guys ( sorry for big write but I am trying to capture the context ), is how to make sure that no matter what broker I would stop, the consumers would work correctly? ( receive all messages normally )?
PS: I tried setting the offsets.topic.replication.factor=2 or 3, but it didn't have any effect.
Messages to that broker will not be ignored if the no. of alive brokers is lesser than the configured replicas. Whenever a new Kafka broker joins the cluster, the data gets replicated to that node. https://stackoverflow.com/a/38998062/6274525
So when your broker 2 goes down, the messages still get pushed to another alive broker because there is 1 live broker and replication factor is 2. Since your other 2 consumers are subscribed to broker 2 (which is down), they are unable to consume.
When your broker 2 is up again, the data gets duplicated to this new node and hence the consumers attached to this node receive the message (referred by you as "missing" messages).
Please make sure you have changed the property called offsets.topic.replication.factor to atleast 3.
This property is used to manage offset and consumer interaction. When a kafka server is started, it auto creates a topic with name __consumer_offsets. So if the replicas are not created in this topic, then a consumer cannot know for sure if something has been pushed to the Topic it was listening to.
Link to detail of this property : https://kafka.apache.org/documentation/#brokerconfigs

kafka-log4j-appender 0.9 does not work

I added a log4j kafka appender in my log4j.properties, but it did not work as my expected.
Before I post this question, I checked my log4j.properties based on this similar question on stackoverflow, which is about 0.8. However, I'm not luck.
Here is my log4j.properties
log4j.appender.Kafka=org.apache.kafka.log4jappender.KafkaLog4jAppender
log4j.appender.Kafka.topic=my-topic
log4j.appender.Kafka.brokerList=localhost:9092
log4j.appender.Kafka.layout=org.apache.log4j.EnhancedPatternLayout
log4j.appender.Kafka.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
When I started my app, I could see Kafka producer was started:
[main] DEBUG org.apache.kafka.clients.producer.KafkaProducer - Kafka producer started
[kafka-producer-network-thread | producer-1] DEBUG org.apache.kafka.clients.producer.internals.Sender - Starting Kafka producer I/O thread.
But the appender did not work and threw an exception:
[kafka-producer-network-thread | producer-1] DEBUG org.apache.kafka.clients.producer.KafkaProducer - Exception occurred during message send:
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
I also checked my Kafka + Zookeeper env, it is correct in my log4j.properties. Now, I have no idea about it. And hope someone could give me a hand. the following is the whole output:
[main] INFO org.apache.kafka.clients.producer.ProducerConfig - ProducerConfig values:
compression.type = none
metric.reporters = []
metadata.max.age.ms = 300000
metadata.fetch.timeout.ms = 60000
reconnect.backoff.ms = 50
sasl.kerberos.ticket.renew.window.factor = 0.8
bootstrap.servers = [localhost:9092]
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
buffer.memory = 33554432
timeout.ms = 30000
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
ssl.keystore.type = JKS
ssl.trustmanager.algorithm = PKIX
block.on.buffer.full = false
ssl.key.password = null
max.block.ms = 60000
sasl.kerberos.min.time.before.relogin = 60000
connections.max.idle.ms = 540000
ssl.truststore.password = null
max.in.flight.requests.per.connection = 5
metrics.num.samples = 2
client.id =
ssl.endpoint.identification.algorithm = null
ssl.protocol = TLS
request.timeout.ms = 30000
ssl.provider = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
acks = 1
batch.size = 16384
ssl.keystore.location = null
receive.buffer.bytes = 32768
ssl.cipher.suites = null
ssl.truststore.type = JKS
security.protocol = PLAINTEXT
retries = 0
max.request.size = 1048576
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
ssl.truststore.location = null
ssl.keystore.password = null
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
send.buffer.bytes = 131072
linger.ms = 0
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name bufferpool-wait-time
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name buffer-exhausted-records
[main] DEBUG org.apache.kafka.clients.Metadata - Updated cluster metadata version 1 to Cluster(nodes = [Node(-1, localhost, 9092)], partitions = [])
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name connections-closed:client-id-producer-1
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name connections-created:client-id-producer-1
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name bytes-sent-received:client-id-producer-1
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name bytes-sent:client-id-producer-1
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name bytes-received:client-id-producer-1
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name select-time:client-id-producer-1
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name io-time:client-id-producer-1
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name batch-size
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name compression-rate
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name queue-time
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name request-time
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name produce-throttle-time
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name records-per-request
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name record-retries
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name errors
[main] DEBUG org.apache.kafka.common.metrics.Metrics - Added sensor with name record-size-max
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 0.9.0.0
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : fc7243c2af4b2b4a
[main] DEBUG org.apache.kafka.clients.producer.KafkaProducer - Kafka producer started
[kafka-producer-network-thread | producer-1] DEBUG org.apache.kafka.clients.producer.internals.Sender - Starting Kafka producer I/O thread.
...
[kafka-producer-network-thread | producer-1] DEBUG org.apache.kafka.clients.producer.KafkaProducer - Exception occurred during message send:
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
Thanks
Finally, I fixed it. Here is a my new log4j.properties
log4j.rootLogger=DEBUG, Console
log4j.appender.Console=org.apache.log4j.ConsoleAppender
log4j.appender.Console.layout=org.apache.log4j.EnhancedPatternLayout
log4j.appender.Console.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
log4j.category.foxgem=DEBUG, Kafka
log4j.additivity.foxgem=false
log4j.appender.Kafka=org.apache.kafka.log4jappender.KafkaLog4jAppender
log4j.appender.Kafka.topic=logTopic
log4j.appender.Kafka.brokerList=localhost:9092
log4j.appender.Kafka.layout=org.apache.log4j.EnhancedPatternLayout
log4j.appender.Kafka.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
log4j.logger.io.vertx=WARN
log4j.logger.io.netty=WARN
I also created an example to show how to use this appender on github.
The changes I made:
remove kafka appender from rootLogger. In my old log4j.properties
log4j.rootLogger=DEBUG, Console, Kafka
add a log category for those packages whose log output will go to Kafka
log4j.category.foxgem=DEBUG, Kafka
log4j.additivity.foxgem=false
I think the reason is: with old rootLogger, the log output from Kafka also went to Kafka, which caused time out.
I had a problem like this with both log4j and logback.
When appender level was INFO everything worked fine but when I was changing level to DEBUG I was getting this error after some time:
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
The problem was that KafkaProducer itself had trace and debug logs and was trying to append these log to Kafka too, so it was stuck in a loop.
Changing org.apache.kafka package log level to INFO or changing its appender to write logs to file or stdout solved the problem.

Categories