ksql-server refuses to boot up - java

I am facing the following problem using Confluent Open Source platform, version 4.1.0:
[2018-05-01 03:43:33,433] ERROR Failed to initialize TopicClient: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. (io.confluent.ksql.util.KafkaTopicClient:257)
Exception in thread "main" io.confluent.ksql.util.KsqlException: Could not fetch broker information. KSQL cannot initialize AdminClient.
at io.confluent.ksql.util.KafkaTopicClientImpl.init(KafkaTopicClientImpl.java:258)
at io.confluent.ksql.util.KafkaTopicClientImpl.<init>(KafkaTopicClientImpl.java:62)
at io.confluent.ksql.rest.server.KsqlRestApplication.buildApplication(KsqlRestApplication.java:237)
at io.confluent.ksql.rest.server.KsqlServerMain.createExecutable(KsqlServerMain.java:58)
at io.confluent.ksql.rest.server.KsqlServerMain.main(KsqlServerMain.java:39)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
Changing the listener port didn't help. How do we fix this?
EDIT1: I am starting the kafka brokers and ksql-server using
confluent start
Initially, "confluent status" shows that the ksql-server is UP, but the server goes down after the above timeout.
EDIT2: Yes, my kafka broker is running and here is my kafka server.properties:
broker.id=100
listeners=PLAINTEXT://localhost:19090
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs-100
num.partitions=3
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=6000
confluent.support.customer.id=anonymous
group.initial.rebalance.delay.ms=0
and ksql-server.properties:
bootstrap.servers=localhost:19090
listeners=http://localhost:18088
ksql.server.ui.enabled=true
EDIT3: My suspicion is this has got something to do with incorrect bootstrap server url, but I am not able to find that yet.
EDIT4: KSQL server logs, as requested.
[2018-05-17 03:41:33,244] INFO KsqlRestConfig values:
metric.reporters = []
ssl.client.auth = false
ksql.server.install.dir = /home/<user name>/confluent/confluent-4.1.0
response.mediatype.default = application/json
authentication.realm =
ssl.keystore.type = JKS
ssl.trustmanager.algorithm =
authentication.method = NONE
metrics.jmx.prefix = rest-utils
request.logger.name = io.confluent.rest-utils.requests
ssl.key.password = [hidden]
ssl.truststore.password = [hidden]
authentication.roles = [*]
metrics.num.samples = 2
ssl.endpoint.identification.algorithm =
compression.enable = false
query.stream.disconnect.check = 1000
ssl.protocol = TLS
debug = false
listeners = [http://localhost:18088]
ssl.provider =
ssl.enabled.protocols = []
shutdown.graceful.ms = 1000
ssl.keystore.location =
response.mediatype.preferred = [application/json]
ssl.cipher.suites = []
authentication.skip.paths = []
ssl.truststore.type = JKS
access.control.allow.methods =
access.control.allow.origin =
ssl.truststore.location =
ksql.server.command.response.timeout.ms = 5000
ssl.keystore.password = [hidden]
ssl.keymanager.algorithm =
port = 8080
metrics.sample.window.ms = 30000
metrics.tag.map = {}
ksql.server.ui.enabled = true
(io.confluent.ksql.rest.server.KsqlRestConfig:179)
[2018-05-17 03:41:33,302] INFO KsqlConfig values:
ksql.persistent.prefix = query_
ksql.schema.registry.url = http://localhost:8081
ksql.service.id = default_
ksql.sink.partitions = 4
ksql.sink.replicas = 1
ksql.sink.window.change.log.additional.retention = 1000000
ksql.statestore.suffix = _ksql_statestore
ksql.transient.prefix = transient_
(io.confluent.ksql.util.KsqlConfig:279)
[2018-05-17 03:43:33,433] ERROR Failed to initialize TopicClient: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. (io.confluent.ksql.util.KafkaTopicClient:257)
Exception in thread "main" io.confluent.ksql.util.KsqlException: Could not fetch broker information. KSQL cannot initialize AdminClient.
at io.confluent.ksql.util.KafkaTopicClientImpl.init(KafkaTopicClientImpl.java:258)
at io.confluent.ksql.util.KafkaTopicClientImpl.<init>(KafkaTopicClientImpl.java:62)
at io.confluent.ksql.rest.server.KsqlRestApplication.buildApplication(KsqlRestApplication.java:237)
at io.confluent.ksql.rest.server.KsqlServerMain.createExecutable(KsqlServerMain.java:58)
at io.confluent.ksql.rest.server.KsqlServerMain.main(KsqlServerMain.java:39)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:258)
at io.confluent.ksql.util.KafkaTopicClientImpl.init(KafkaTopicClientImpl.java:230)
... 4 more
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.

I had this same exact problem on Confluent platform 5.4.1 and 5.3.1. I'm running MacOs 10.14.6. It turned out another application had taken port 8081 and therefore schema-registry was not able to bind it. I configured schema-registry to use port 8881 and reconfigured the schema-registry port on ksql-server configuration to that same value. This solved the problem
Therefore I would suggest you check that schema-registry is able to bind the configured port and that ksql-server is able to connect to that same port.

Related

listTopics with kafka-clients-2.6.0 gives TimeoutException

I have a small code to check if a particular topic is already present in Kafka. It worked fine with kafka-clients-2.5.0. But after upgrading to kafka-clients to 2.6.0, it started giving TimeoutException.
This was my original code.
Properties adminProperties = new Properties();
adminProperties.put(ProducerConfig."bootstrap.servers", "localhost:9092");
AdminClient adminClient = KafkaAdminClient.create(adminProperties);
boolean topicExists = adminClient.listTopics().names().get().contains("myDataTopic");
For troubleshooting, I have splitted it and tried extending some timeout values like below. But no use. It works fine with 2.5.1 but not with 2.6.0.
Properties adminProperties = new Properties();
adminProperties.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
adminProperties.put(AdminClientConfig.DEFAULT_API_TIMEOUT_MS_CONFIG, "900000");
AdminClient adminClient = KafkaAdminClient.create(adminProperties);
System.out.println("createKafkaTopic(): Listing Topics...");
ListTopicsResult listTopicsResult = adminClient.listTopics(new ListTopicsOptions().timeoutMs(900000));
System.out.println("createKafkaTopic(): Retrieve Topic names...");
KafkaFuture<Collection<TopicListing>> setKafkaFuture = listTopicsResult.listings();
System.out.println("createKafkaTopic(): Display existing Topics...");
while(!setKafkaFuture.isDone()) {
System.out.println("Waiting...");
Thread.sleep(10);
}
Collection<TopicListing> topicNames = setKafkaFuture.get(900,TimeUnit.SECONDS);
System.out.println(topicNames);
System.out.println("createKafkaTopic(): Check if Topic exists...");
boolean topicExists = topicNames.contains("myDataTopic");
Here is my output:
createKafkaTopic(): Listing Topics...
createKafkaTopic(): Retrieve Topic names...
createKafkaTopic(): Display existing Topics...
Waiting...
Waiting...
Waiting...
Waiting...
Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Call(callName=listTopics, deadlineMs=1604903438670, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807 after 1 attempt(s)
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at KafkaUtil.createKafkaTopic(KafkaUtil.java:45)
at KafkaUtil.main(KafkaUtil.java:21)
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=listTopics, deadlineMs=1604903438670, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: The AdminClient thread has exited.
I saw a similar issue here (How to display topics using Kafka Clients in Java?). But it seems resolved by adding some dependencies. I too tried adding all dependencies to my pom.xml, and no luck.
Upgrading kafka-clients to version 2.6.3 worked for me.

kafka consumer try to connect to random hostname instead right one

I'm new to Kafka and started exploring with sample program. It used to work without any issue but all of sudden consumer.poll() command hangs and never returns. Googling suggested to check the servers are accessible. Producer and Consumer java code runs in same machine, where producer able to post record to Kafka, but consumer poll method hangs.
Environment:
Kafka version: 1.1.0
Client: Java
Runs in Ubuntu docker container inside windows
Zookeeper and 2 Broker servers runs in same container
When I have enabled logging for client code, I see below exception:
2018-07-06 21:24:18 DEBUG NetworkClient:802 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Error connecting to node 4bdce773eb74:9095 (id: 2 rack: null)
java.io.IOException: Can't resolve address: 4bdce773eb74:9095
at org.apache.kafka.common.network.Selector.doConnect(Selector.java:235)
at org.apache.kafka.common.network.Selector.connect(Selector.java:214)
.................
.................
I'm not sure why consumer trying to connect to 4bdce773eb74 even though my broker servers are 192.168.99.100:9094,192.168.99.100:9095. And my full consumer code:
final String BOOTSTRAP_SERVERS = "192.168.99.100:9094,192.168.99.100:9095";
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ConsumerConfig.GROUP_ID_CONFIG, "Event_Consumer");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
KafkaConsumer<Long, String> consumer = new KafkaConsumer<Long, String>(props);
TopicPartition tpLogin = new TopicPartition("login1", 0);
TopicPartition tpLogout = new TopicPartition("logout1", 1);
List<TopicPartition> tps = Arrays.asList(tpLogin, tpLogout);
consumer.assign(tps);
while (true) {
final ConsumerRecords<Long, String> consumerRecords = consumer.poll(1000);
if (consumerRecords.count()==0) {
continue;
}
consumerRecords.forEach(record -> {
System.out.printf("Consumer Record:(%d, %s, %d, %d)\n", record.key(), record.value(),
record.partition(), record.offset());
});
consumer.commitAsync();
Thread.sleep(5000);
}
}
Please help in this issue.
EDIT
As I said earlier, I have 2 brokers, say broker-1 and broker-2. If I stop broker-1, then above exception is not logged, but still poll() method didn't returns.
Below message logged indefinitely, if I stop broker-1:
2018-07-07 11:31:24 DEBUG AbstractCoordinator:579 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Sending FindCoordinator request to broker 192.168.99.100:9094 (id: 1 rack: null)
2018-07-07 11:31:24 DEBUG AbstractCoordinator:590 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Received FindCoordinator response ClientResponse(receivedTimeMs=1530943284196, latencyMs=2, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, clientId=consumer-1, correlationId=573), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null)))
2018-07-07 11:31:24 DEBUG AbstractCoordinator:613 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Group coordinator lookup failed: The coordinator is not available.
2018-07-07 11:31:24 DEBUG AbstractCoordinator:227 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Coordinator discovery failed, refreshing metadata
Thanks in Advance,
Soman
I found the issue. When I'm creating topic, broker-0(runs on port:9093; broker id:0) and broker-2(runs on port:9094; broker id:2) was running. Today I have mistakenly started broker-1(runs on port:9095; broker id:1) and broker-2. After stopping broker-1 and starting broker-0, resolves the issue. Now consumer able to get the events.
Definitely human error from my side, but I have 2 comments:
I think Kafka should gracefully use broker-2(port no:9094) and ignore broker-1(port no:9095)
why Kafka trying to contact 4bdce773eb74:9095, instead of right IP address(192.168.99.100)?
thanks.

Spring-Kafka consumer doesn't receive messages

I don't know whats going on that my java client consumer annotated with #KafkaListener doesn't receives any messages. When I create consumer via command line it works. Also Producer works as expected (also in java). Could someone help me to understand that behavior?
application.yml
kafka:
bootstrap-servers: localhost:9092
topic: my-topic
producer config:
#Configuration
public class KafkaProducerConfig {
#Value("${kafka.bootstrap-servers}")
private String bootstrapServers;
#Bean
public ProducerFactory<String, String> producerFactory(){
Map<String, Object> configProps = new HashMap<>();
configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
return new DefaultKafkaProducerFactory<>(configProps);
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate(){
return new KafkaTemplate<>(producerFactory());
}
}
consumer config:
#EnableKafka
#Configuration
class KafkaConsumerConfig {
#Value("${kafka.bootstrap-servers}")
String bootstrapServers;
#Bean
public ConsumerFactory<String, String> consumerFactory(){
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
return new DefaultKafkaConsumerFactory<>(props);
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory(){
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
return factory;
}
}
Producer & Consumer:
#Service
class Producer {
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
#Value("${kafka.topic}")
String kafkaTopic;
public void send(String payload){
System.out.println("sending " + payload + " to " + kafkaTopic);
kafkaTemplate.send(kafkaTopic, payload);
}
}
#Service
public class Consumer {
#KafkaListener(topics = "${kafka.topic}")
public void receive(String payload){
System.out.println(payload + " aaaaaaaaaaaaaaaaaaaaaaaaaaa");
}
}
Spring controller:
#RestController
#RequestMapping(value = "/kafka")
class WebRestController {
#Autowired
Producer producer;
#GetMapping(value = "/producer")
public String producer(String data){
producer.send(data);
return "Done";
}
}
This is my console output, as you see It sends a message but method doesn't receive anything. It works if I'm not using spring-kafka, just pure kafka-api. It also works, when I bind consumer in command line - i see messages sent by java-code-producer.
2018-04-03 13:43:41.688 INFO 8068 --- [ main] o.a.k.clients.consumer.ConsumerConfig : ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [localhost:9092]
check.crcs = true
client.id =
connections.max.idle.ms = 540000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id =
heartbeat.interval.ms = 3000
interceptor.classes = null
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
2018-04-03 13:43:41.743 INFO 8068 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka version : 0.11.0.0
2018-04-03 13:43:41.743 INFO 8068 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : cb8625948210849f
2018-04-03 13:43:41.774 INFO 8068 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path ''
2018-04-03 13:43:41.777 INFO 8068 --- [ main] kafka.KafkaExample : Started KafkaExample in 3.653 seconds (JVM running for 4.195)
2018-04-03 13:43:47.245 INFO 8068 --- [nio-8080-exec-3] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring FrameworkServlet 'dispatcherServlet'
2018-04-03 13:43:47.245 INFO 8068 --- [nio-8080-exec-3] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization started
2018-04-03 13:43:47.264 INFO 8068 --- [nio-8080-exec-3] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization completed in 19 ms
sending Hello to my-topic
2018-04-03 13:43:47.300 INFO 8068 --- [nio-8080-exec-3] o.a.k.clients.producer.ProducerConfig : ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [localhost:9092]
buffer.memory = 33554432
client.id =
compression.type = none
connections.max.idle.ms = 540000
enable.idempotence = false
interceptor.classes = null
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.StringSerializer
2018-04-03 13:43:47.315 INFO 8068 --- [nio-8080-exec-3] o.a.kafka.common.utils.AppInfoParser : Kafka version : 0.11.0.0
2018-04-03 13:43:47.315 INFO 8068 --- [nio-8080-exec-3] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : cb8625948210849f
EDIT:
kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 13 --topic my-topic
This is my server.properties files:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3
# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/tmp/kafka-logs
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=localhost:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
group.id =
You need a group.id for the consumer.
Set it in the consumer factory properties.
BTW, when using boot, you don't need a consumer factory bean or container factory bean, you can use boot properties for that.
Logging can be enabled with logging.level... in the properties/yaml.

Kafka 0.10 Java Client TimeoutException: Batch containing 1 record(s) expired

I have a single node, multi (3) broker Zookeeper / Kafka setup. I am using the Kafka 0.10 Java client.
I wrote following simple remote (on a different Server than Kafka) Producer (in the code I replaced my public IP address with MYIP):
Properties config = new Properties();
try {
config.put(ProducerConfig.CLIENT_ID_CONFIG, InetAddress.getLocalHost().getHostName());
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "MYIP:9092, MYIP:9093, MYIP:9094");
config.put(ProducerConfig.ACKS_CONFIG, "all");
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArraySerializer");
producer = new KafkaProducer<String, byte[]>(config);
Schema.Parser parser = new Schema.Parser();
schema = parser.parse(GATEWAY_SCHEMA);
recordInjection = GenericAvroCodecs.toBinary(schema);
GenericData.Record avroRecord = new GenericData.Record(schema);
//Filling in avroRecord (code not here)
byte[] bytes = recordInjection.apply(avroRecord);
Future<RecordMetadata> future = producer.send(new ProducerRecord<String, byte[]>(datasetId+"", "testKey", bytes));
RecordMetadata data = future.get();
} catch (Exception e) {
e.printStackTrace();
}
My server properties for the 3 brokers look like this (in the 3 different server properties files broker.id is 0, 1, 2 and listeners is PLAINTEXT://:9092, PLAINTEXT://:9093, PLAINTEXT://:9094 and host.name is 10.2.0.4, 10.2.0.5, 10.2.0.6).
This is the first server properties file:
broker.id=0
listeners=PLAINTEXT://:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka1-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=6000
When I execute the code, I get following exception:
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Batch containing 1 record(s) expired due to timeout while requesting metadata from brokers for 100101-0
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:65)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:52)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25)
at com.nr.roles.gateway.GatewayManager.addTransaction(GatewayManager.java:212)
at com.nr.roles.gateway.gw.service(gw.java:126)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:821)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1158)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1090)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
at org.eclipse.jetty.server.Server.handle(Server.java:517)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:261)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.errors.TimeoutException: Batch containing 1 record(s) expired due to timeout while requesting metadata from brokers for 100101-0
Does anyone know what I am missing? Any help would be appreciated. Thanks a lot
I encounter the same problems.
You should change your kafka server.properties to specify ip address.
eg:
PLAINTEXT://YOUIP:9093
if not, kafka will use hostname, if the producer can not get the host, it can not send message to kafka even if you can telnet them.
Port information in your BOOTSTRAP_SERVERS_CONFIG configuration is incorrect (MYIP:9092).
As you've mentioned in server.properties as "PLAINTEXT://:9093, PLAINTEXT://:9093, PLAINTEXT://:9094".
This answer shares some insight. You can increase the request.timeout.ms producer configuration which will allow the client to queue batches for longer before expiring.
You might also want to look into the batch.size and linger.ms configurations and find the optimal that works in your case.

Flume agent unable to deliver event

I am having 3 agents. 1 is running on windows using memory channel and other 2 are on Linux using File Channel to get data from windows agent and put in to Hbase.
Can any one suggests why the following error occurs and what are the step to stop it?
2013-12-23 14:50:15,290 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.sink.AvroSink.destroyConnection(AvroSink.java:199)] Avro sinksink1 closing avro client: NettyAvroRpcClient { host: 192.168.101.232, port: 3001 }
2013-12-23 14:50:15,290 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event.
Exception follows.org.apache.flume.EventDeliveryException: Failed to send events
at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: 192.168.101.232, port: 3001 }: Failed to send batch
at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
at org.apache.flume.sink.AvroSink.process(AvroSink.java:309) ... 3 more
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host: 192.168.101.232, port: 3001 }: Handshake timed out after 20000ms
at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224) ... 4 more
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(Unknown Source)
at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
Following is my windows config file
a1.sources = source1
a1.channels = channel1 channel2
a1.sinks = sink1 sink2
a1.sources.source1.handler = com.flume.handler.DynamicJSONHandler
a1.sources.source1.type = org.apache.flume.source.http.HTTPSource
a1.sources.source1.bind = 192.168.101.29
a1.sources.source1.port = 2001
a1.channels.channel1.type = org.apache.flume.channel.MemoryChannel
a1.channels.channel1.capacity = 1000
a1.channels.channel1.transactionCapacity = 1000
a1.sinks.sink1.type = org.apache.flume.sink.AvroSink
a1.sinks.sink1.hostname = 192.168.101.232
a1.sinks.sink1.port = 3001
a1.channels.channel2.type = org.apache.flume.channel.MemoryChannel
a1.channels.channel2.capacity = 1000
a1.channels.channel2.transactionCapacity = 1000
a1.sinks.sink2.type = org.apache.flume.sink.AvroSink
a1.sinks.sink2.hostname = 192.168.101.233
a1.sinks.sink2.port = 3001
a1.sources.source1.channels = channel1 channel2
a1.sinks.sink1.channel = channel1
a1.sinks.sink2.channel = channel2
AvroSink is meant to write data to another flume agent which has an Avro Source listening on that specific source. The Avro Sink initiates a handshake with the source it is connecting to. But it looks like that handshake is timing out after 20 seconds - so it is likely your network is facing major latency issues. Do you see any issues on the agent running the source?

Categories