I have a Kerberized cluster with 2 Kafka brokers and 3 Zookeeper nodes.
I have a Java application, in local, ( using kafka-client 2.4.0 ) that must produce and read messages from topics in Kafka brokers.
In input at the VM I gave:
-Djava.security.auth.login.config=/Users/mypath/kafka_client_jaas.conf
I also have a schema registry ( confluent ), in local, connected to the cluster.
To create a Producer a set those options:
Properties producerProps = new Properties();
producerProps.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
producerProps.put("sasl.kerberos.service.name", "kafka");
producerProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, Constants.KAFKA_BROKERS);
producerProps.put(ProducerConfig.CLIENT_ID_CONFIG, user);
producerProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
producerProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
producerProps.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081");
error:
java.io.IOException: Configuration Error:
row 5: expected [option key], found [null]
at sun.security.provider.ConfigFile$Spi.<init>(ConfigFile.java:137)
at sun.security.provider.ConfigFile.<init>(ConfigFile.java:102)
The problem is caused by
producerProps.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
Without this line, the application compiles but I receive:
org.apache.kafka.common.errors.TimeoutException: Topic TEST not present in metadata after 60000 ms.
What should I do?
P.S. the "kafka_client_jaas.conf" should be right, since I use it also with schema registry and it works fine.
kafka_client_jaas.conf:
KafkaClient { com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab="/Users/path/kafka.service.keytab" \
principal="kafka/kafka-broker01.domain.xx#DOMAIN.XX"; };
Related
I have a small code to check if a particular topic is already present in Kafka. It worked fine with kafka-clients-2.5.0. But after upgrading to kafka-clients to 2.6.0, it started giving TimeoutException.
This was my original code.
Properties adminProperties = new Properties();
adminProperties.put(ProducerConfig."bootstrap.servers", "localhost:9092");
AdminClient adminClient = KafkaAdminClient.create(adminProperties);
boolean topicExists = adminClient.listTopics().names().get().contains("myDataTopic");
For troubleshooting, I have splitted it and tried extending some timeout values like below. But no use. It works fine with 2.5.1 but not with 2.6.0.
Properties adminProperties = new Properties();
adminProperties.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
adminProperties.put(AdminClientConfig.DEFAULT_API_TIMEOUT_MS_CONFIG, "900000");
AdminClient adminClient = KafkaAdminClient.create(adminProperties);
System.out.println("createKafkaTopic(): Listing Topics...");
ListTopicsResult listTopicsResult = adminClient.listTopics(new ListTopicsOptions().timeoutMs(900000));
System.out.println("createKafkaTopic(): Retrieve Topic names...");
KafkaFuture<Collection<TopicListing>> setKafkaFuture = listTopicsResult.listings();
System.out.println("createKafkaTopic(): Display existing Topics...");
while(!setKafkaFuture.isDone()) {
System.out.println("Waiting...");
Thread.sleep(10);
}
Collection<TopicListing> topicNames = setKafkaFuture.get(900,TimeUnit.SECONDS);
System.out.println(topicNames);
System.out.println("createKafkaTopic(): Check if Topic exists...");
boolean topicExists = topicNames.contains("myDataTopic");
Here is my output:
createKafkaTopic(): Listing Topics...
createKafkaTopic(): Retrieve Topic names...
createKafkaTopic(): Display existing Topics...
Waiting...
Waiting...
Waiting...
Waiting...
Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Call(callName=listTopics, deadlineMs=1604903438670, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807 after 1 attempt(s)
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
at KafkaUtil.createKafkaTopic(KafkaUtil.java:45)
at KafkaUtil.main(KafkaUtil.java:21)
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=listTopics, deadlineMs=1604903438670, tries=1, nextAllowedTryMs=-9223372036854775709) timed out at 9223372036854775807 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: The AdminClient thread has exited.
I saw a similar issue here (How to display topics using Kafka Clients in Java?). But it seems resolved by adding some dependencies. I too tried adding all dependencies to my pom.xml, and no luck.
Upgrading kafka-clients to version 2.6.3 worked for me.
During deployment with only changed Kafka-Streams version from 1.1.1 to 2.x.x (without changing application.id), we got exceptions on app node with older Kafka-Streams version and, as a result, Kafka streams changed state to error and closed, meanwhile app node with new Kafka-Streams version consumes messages fine.
If we upgrade from 1.1.1 to 2.0.0, got error unable to decode subscription data: version=3; if from 1.1.1 to 2.3.0: unable to decode subscription data: version=4.
It might be really painful during canary deployment, e.g. we have 3 app nodes with previous Kafka-Streams version, and when we add one more node with a new version, all existing 3 nodes will be in error state. Error stack trace:
TaskAssignmentException: unable to decode subscription data: version=4
at org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfo.decode(SubscriptionInfo.java:128)
at org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.assign(StreamPartitionAssignor.java:297)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:358)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:520)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1100(AbstractCoordinator.java:93)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:472)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:455)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:822)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:802)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:563)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:390)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:293)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:364)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:290)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1149)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:831)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:788)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:719)
Issue is reproducible on both Kafka broker versions 1.1.0 and 2.1.1, even with the simple Kafka-Streams DSL example:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("default.key.serde", "org.apache.kafka.common.serialization.Serdes$StringSerde");
props.put("default.value.serde", "org.apache.kafka.common.serialization.Serdes$StringSerde");
props.put("application.id", "xxx");
StreamsBuilder streamsBuilder = new StreamsBuilder();
streamsBuilder.<String, String>stream("source")
.mapValues(value -> value + value)
.to("destination");
KafkaStreams kafkaStreams = new KafkaStreams(streamsBuilder.build(), props);
Is it a bug of kafka-streams? Does exist any workaround to prevent such failure?
My use case:
Using Postman, I call a Spring boot soap endpoint. The endpoint creates a KafkaProducer and send a message to a specific topic. I also have a TaskScheduler to consume the topic.
The problem:
When calling soap to push a message to a topic, I get this error:
2017-11-14 21:29:31.463 ERROR 6389 --- [ad | producer-3]
DomainEntityProducer : Expiring 1 record(s) for
DomainEntityCommandStream-0: 30030 ms has passed since batch creation
plus linger time 2017-11-14 21:29:31.464 ERROR 6389 ---
[nio-8080-exec-6] DomainEntityProducer :
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s)
for DomainEntityCommandStream-0: 30030 ms has passed since batch
creation plus linger time
Here’s the method I use to push to the topic:
public DomainEntity push(DomainEntity pDomainEntity) throws Exception {
logger.log(Level.INFO, "streaming...");
wKafkaProperties.put("bootstrap.servers", "localhost:9092");
wKafkaProperties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
wKafkaProperties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer wKafkaProducer = new KafkaProducer(wKafkaProperties);
ProducerRecord wProducerRecord = new ProducerRecord("DomainEntityCommandStream", getJSON(pDomainEntity));
wKafkaProducer.send(wProducerRecord, (RecordMetadata r, Exception e) -> {
if (e != null) {
logger.log(Level.SEVERE, e.getMessage());
}
}).get();
return pDomainEntity;
}
Using command shell scripts
./kafka-console-producer.sh --broker-list 10.0.1.15:9092 --topic
DomainEntityCommandStream
and
./kafka-console-consumer.sh --boostrap-server 10.0.1.15:9092 --topic
DomainEntityCommandStream --from-beginning
works very well.
Going through some related problems on Stackoverflow, I have tried to purge the topic:
./kafka-topics.sh --zookeeper 10.0.1.15:9092 --alter --topic
DomainEntityCommandStream --config retention.ms=1000
Looking at kafka logs, I see that retention time was altered.
But, no luck, I get the same error.
The payload is ridiculously small, so why should I change batch.size?
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:gs="http://soap.problem.com">
<soapenv:Header/>
<soapenv:Body>
<gs:streamDomainEntityRequest>
<gs:domainEntity>
<gs:name>12345</gs:name>
<gs:value>Quebec</gs:value>
<gs:version>666</gs:version>
</gs:domainEntity>
</gs:streamDomainEntityRequest>
</soapenv:Body>
</soapenv:Envelope>
Using Docker and Kafka 0.11.0.1 image you need to add the following environment parameters to the container:
KAFKA_ZOOKEEPER_CONNECT = X.X.X.X:XXXX (your zookeeper IP or domain : PORT default 2181)
KAFKA_ADVERTISED_HOST_NAME = X.X.X.X (your kafka IP or domain)
KAFKA_ADVERTISED_PORT = XXXX (your kafka PORT number default 9092)
Optionally:
KAFKA_BROKER_ID = 999 (some value)
KAFKA_CREATE_TOPICS=test:1:1 (some topic name to create at start)
If it doesn't work and you still get same message ("Expiring X record(s) for xxxxx: XXXXX ms has passed since batch creation plus linger time") you can try cleaning the kafka data from zookeeper.
When I am write a topic to kafka,there is an error:Offset commit failed:
2016-10-29 14:52:56.387 INFO [nioEventLoopGroup-3-1][org.apache.kafka.common.utils.AppInfoParser$AppInfo:82] - Kafka version : 0.9.0.1
2016-10-29 14:52:56.387 INFO [nioEventLoopGroup-3-1][org.apache.kafka.common.utils.AppInfoParser$AppInfo:83] - Kafka commitId : 23c69d62a0cabf06
2016-10-29 14:52:56.409 ERROR [nioEventLoopGroup-3-1][org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$DefaultOffsetCommitCallback:489] - Offset commit failed.
org.apache.kafka.common.errors.GroupCoordinatorNotAvailableException: The group coordinator is not available.
2016-10-29 14:52:56.519 WARN [kafka-producer-network-thread | producer-1][org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater:582] - Error while fetching metadata with correlation id 0 : {0085000=LEADER_NOT_AVAILABLE}
2016-10-29 14:52:56.612 WARN [pool-6-thread-1][org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater:582] - Error while fetching metadata with correlation id 1 : {0085000=LEADER_NOT_AVAILABLE}
When create a new topic using command,it is ok.
./kafka-topics.sh --zookeeper localhost:2181 --create --topic test1 --partitions 1 --replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1
This is the producer code using Java:
public void create() {
Properties props = new Properties();
props.clear();
String producerServer = PropertyReadHelper.properties.getProperty("kafka.producer.bootstrap.servers");
String zookeeperConnect = PropertyReadHelper.properties.getProperty("kafka.producer.zookeeper.connect");
String metaBrokerList = PropertyReadHelper.properties.getProperty("kafka.metadata.broker.list");
props.put("bootstrap.servers", producerServer);
props.put("zookeeper.connect", zookeeperConnect);//声明ZooKeeper
props.put("metadata.broker.list", metaBrokerList);//声明kafka broker
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 1000);
props.put("linger.ms", 10000);
props.put("buffer.memory", 10000);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
producer = new KafkaProducer<String, String>(props);
}
Where is wrong?
I faced a similar issue. The problem was when you start your Kafka broker there is a property associated with it, "KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR". If you are working with a single node cluster make sure you set this property with the value '1'. As its default value is 3. This change resolved my problem. (you can check the value in Kafka.properties file)
Note: I was using base image of confluent kafka version 4.0.0 ( confluentinc/cp-kafka:4.0.0)
Looking at your logs the problem is that cluster probably don't have connection to node which is the only one know replica of given topic in zookeeper.
You can check it using given command:
kafka-topics.sh --describe --zookeeper localhost:2181 --topic test1
or using kafkacat:
kafkacat -L -b localhost:9092
Example result:
Metadata for all topics (from broker 1003: localhost:9092/1003):
1 brokers:
broker 1003 at localhost:9092
1 topics:
topic "topic1" with 1 partitions:
partition 0, leader -1, replicas: 1001, isrs: , Broker: Leader not available
If you have single node cluster then broker id(1001) should match leader of topic1 partition.
But as you can see the only one known replica of topic1 was 1001 - which is not available now, so there is no possibility to recreate topic on different node.
The source of the problem can be an automatic generation of broker id(if you don't have specified broker.id or it is set to -1).
Then on starting the broker(the same single broker) you probably receive broker id different that previously and different than was marked in zookeeper (this a reason why partition deletion can help - but it is not a production solution).
The solution may be setting broker.id value in node config to fixed value - according to documentation it should be done on produciton environment:
broker.id=1
If everything is alright you should receive sth like this:
Metadata for all topics (from broker 1: localhost:9092/1001):
1 brokers:
broker 1 at localhost:9092
1 topics:
topic "topic1" with 1 partitions:
partition 0, leader 1, replicas: 1, isrs: 1
Kafka Documentation:
https://kafka.apache.org/documentation/#prodconfig
Hi you have to keep your kafka replicas and replication factor for your code same.
for me i keep 3 as replicas and 3 as replication factor.
The solution for me was that I had to make sure KAFKA_ADVERTISED_HOST_NAME was the correct IP address of the server.
We had the same issue and replicas and replication factors both were 3. and the Partition count was 1 . I increased the partition count to 10 and it started working.
We faced same issue in production too. The code was working fine for long time suddenly we got this exception.
We analyzed that there is no issue in code. So we asked deployment team to restart the zookeeper. Restarting it solved the issue.
I have a single node, multi (3) broker Zookeeper / Kafka setup. I am using the Kafka 0.10 Java client.
I wrote following simple remote (on a different Server than Kafka) Producer (in the code I replaced my public IP address with MYIP):
Properties config = new Properties();
try {
config.put(ProducerConfig.CLIENT_ID_CONFIG, InetAddress.getLocalHost().getHostName());
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "MYIP:9092, MYIP:9093, MYIP:9094");
config.put(ProducerConfig.ACKS_CONFIG, "all");
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArraySerializer");
producer = new KafkaProducer<String, byte[]>(config);
Schema.Parser parser = new Schema.Parser();
schema = parser.parse(GATEWAY_SCHEMA);
recordInjection = GenericAvroCodecs.toBinary(schema);
GenericData.Record avroRecord = new GenericData.Record(schema);
//Filling in avroRecord (code not here)
byte[] bytes = recordInjection.apply(avroRecord);
Future<RecordMetadata> future = producer.send(new ProducerRecord<String, byte[]>(datasetId+"", "testKey", bytes));
RecordMetadata data = future.get();
} catch (Exception e) {
e.printStackTrace();
}
My server properties for the 3 brokers look like this (in the 3 different server properties files broker.id is 0, 1, 2 and listeners is PLAINTEXT://:9092, PLAINTEXT://:9093, PLAINTEXT://:9094 and host.name is 10.2.0.4, 10.2.0.5, 10.2.0.6).
This is the first server properties file:
broker.id=0
listeners=PLAINTEXT://:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka1-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=6000
When I execute the code, I get following exception:
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Batch containing 1 record(s) expired due to timeout while requesting metadata from brokers for 100101-0
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:65)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:52)
at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:25)
at com.nr.roles.gateway.GatewayManager.addTransaction(GatewayManager.java:212)
at com.nr.roles.gateway.gw.service(gw.java:126)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:821)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1158)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1090)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
at org.eclipse.jetty.server.Server.handle(Server.java:517)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:261)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.errors.TimeoutException: Batch containing 1 record(s) expired due to timeout while requesting metadata from brokers for 100101-0
Does anyone know what I am missing? Any help would be appreciated. Thanks a lot
I encounter the same problems.
You should change your kafka server.properties to specify ip address.
eg:
PLAINTEXT://YOUIP:9093
if not, kafka will use hostname, if the producer can not get the host, it can not send message to kafka even if you can telnet them.
Port information in your BOOTSTRAP_SERVERS_CONFIG configuration is incorrect (MYIP:9092).
As you've mentioned in server.properties as "PLAINTEXT://:9093, PLAINTEXT://:9093, PLAINTEXT://:9094".
This answer shares some insight. You can increase the request.timeout.ms producer configuration which will allow the client to queue batches for longer before expiring.
You might also want to look into the batch.size and linger.ms configurations and find the optimal that works in your case.