The group coordinator is not available-Kafka

The group coordinator is not available-Kafka - java

When I am write a topic to kafka,there is an error:Offset commit failed:
2016-10-29 14:52:56.387 INFO [nioEventLoopGroup-3-1][org.apache.kafka.common.utils.AppInfoParser$AppInfo:82] - Kafka version : 0.9.0.1
2016-10-29 14:52:56.387 INFO [nioEventLoopGroup-3-1][org.apache.kafka.common.utils.AppInfoParser$AppInfo:83] - Kafka commitId : 23c69d62a0cabf06
2016-10-29 14:52:56.409 ERROR [nioEventLoopGroup-3-1][org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$DefaultOffsetCommitCallback:489] - Offset commit failed.
org.apache.kafka.common.errors.GroupCoordinatorNotAvailableException: The group coordinator is not available.
2016-10-29 14:52:56.519 WARN [kafka-producer-network-thread | producer-1][org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater:582] - Error while fetching metadata with correlation id 0 : {0085000=LEADER_NOT_AVAILABLE}
2016-10-29 14:52:56.612 WARN [pool-6-thread-1][org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater:582] - Error while fetching metadata with correlation id 1 : {0085000=LEADER_NOT_AVAILABLE}
When create a new topic using command,it is ok.
./kafka-topics.sh --zookeeper localhost:2181 --create --topic test1 --partitions 1 --replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1
This is the producer code using Java：
public void create() {
Properties props = new Properties();
props.clear();
String producerServer = PropertyReadHelper.properties.getProperty("kafka.producer.bootstrap.servers");
String zookeeperConnect = PropertyReadHelper.properties.getProperty("kafka.producer.zookeeper.connect");
String metaBrokerList = PropertyReadHelper.properties.getProperty("kafka.metadata.broker.list");
props.put("bootstrap.servers", producerServer);
props.put("zookeeper.connect", zookeeperConnect);//声明ZooKeeper
props.put("metadata.broker.list", metaBrokerList);//声明kafka broker
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 1000);
props.put("linger.ms", 10000);
props.put("buffer.memory", 10000);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
producer = new KafkaProducer<String, String>(props);
}
Where is wrong？

I faced a similar issue. The problem was when you start your Kafka broker there is a property associated with it, "KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR". If you are working with a single node cluster make sure you set this property with the value '1'. As its default value is 3. This change resolved my problem. (you can check the value in Kafka.properties file)
Note: I was using base image of confluent kafka version 4.0.0 ( confluentinc/cp-kafka:4.0.0)

Looking at your logs the problem is that cluster probably don't have connection to node which is the only one know replica of given topic in zookeeper.
You can check it using given command:
kafka-topics.sh --describe --zookeeper localhost:2181 --topic test1
or using kafkacat:
kafkacat -L -b localhost:9092
Example result:
Metadata for all topics (from broker 1003: localhost:9092/1003):
1 brokers:
broker 1003 at localhost:9092
1 topics:
topic "topic1" with 1 partitions:
partition 0, leader -1, replicas: 1001, isrs: , Broker: Leader not available
If you have single node cluster then broker id(1001) should match leader of topic1 partition.
But as you can see the only one known replica of topic1 was 1001 - which is not available now, so there is no possibility to recreate topic on different node.
The source of the problem can be an automatic generation of broker id(if you don't have specified broker.id or it is set to -1).
Then on starting the broker(the same single broker) you probably receive broker id different that previously and different than was marked in zookeeper (this a reason why partition deletion can help - but it is not a production solution).
The solution may be setting broker.id value in node config to fixed value - according to documentation it should be done on produciton environment:
broker.id=1
If everything is alright you should receive sth like this:
Metadata for all topics (from broker 1: localhost:9092/1001):
1 brokers:
broker 1 at localhost:9092
1 topics:
topic "topic1" with 1 partitions:
partition 0, leader 1, replicas: 1, isrs: 1
Kafka Documentation:
https://kafka.apache.org/documentation/#prodconfig

Hi you have to keep your kafka replicas and replication factor for your code same.
for me i keep 3 as replicas and 3 as replication factor.

The solution for me was that I had to make sure KAFKA_ADVERTISED_HOST_NAME was the correct IP address of the server.

We had the same issue and replicas and replication factors both were 3. and the Partition count was 1 . I increased the partition count to 10 and it started working.

We faced same issue in production too. The code was working fine for long time suddenly we got this exception.
We analyzed that there is no issue in code. So we asked deployment team to restart the zookeeper. Restarting it solved the issue.

Related

My kafka consumer does not consume messages from offset 0 as expected

kafka version: 5.5.3-ccs
Good morning,
I noticed a behavior with my kafka consumer written in java and I hope anyone could help me with this.
My consumer looks like this:
while (true){
long pollInterval = 50; //milliseconds
ConsumerRecords<String, String> records = this.consumer.poll(Duration.ofMillis(pollInterval));
for (ConsumerRecord<String, String> rec: records){
logger.debug(String.format("processing record(topic=%s, partition=%s, offset=%s, key=%s)", rec.topic(), rec.partition(), rec.offset(), rec.key()));
}
}
I recreate my topics before any tests because our test chain will create new topics for each of them.
The commands I'm using to recreate my topics are
./kafka-topics --bootstrap-server host:port --topic ${TOPIC} --delete
./kafka-topics --bootstrap-server host:port --topic ${TOPIC} --create --replication-factor 2 --partitions 2
After my test, I know the number of messages that were posted but my log file shows less messages were consumed.
I check if my consumer-group was lagging for this topic but it wasn't the case
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
test-group topic 0 19464 19464 0
test-group topic 1 19275 19275 0
Then looking in details at my log file I noticed the first message consumed for both partitions didn't start from offset 0 (respectively offset 79 for partition 0 / offset 70 for partition 1)
2021-09-13 15:15:16,780[Thread-0][MyKafkaConsumer][getMessage]-DEBUG- processing record(topic=topic, partition=1, offset=70, key=6154475)
2021-09-13 15:15:17,046[Thread-0][MyKafkaConsumer][getMessage]-DEBUG- processing record(topic=topic, partition=0, offset=79, key=6154476)
My understanding is if I recreate my topics, it's like resetting offset and in that case I'll consume all messages posted in this topic from the beginning. Here I basically miss 149 messages.
Anyone could explain me what I'm doing wrong?
Thanks in advance

Kafka consumer unable to read data on Windows

I am trying to write a simple Kafka consumer to read the data on a windows machine.
but my consumer is not able to read any data. There are more than 20 messages produced by the producer but none is getting consumed.
Below is my code for the consumer :
Properties properties= new Properties();
properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"my_application");
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
// properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
//create the consumer
KafkaConsumer<String, String> consumer= new KafkaConsumer<String, String>(properties);
consumer.subscribe(Collections.singleton("first_topic"));
ConsumerRecords<String, String> record=consumer.poll(Duration.ofMillis(100));
for(ConsumerRecord data: record){
System.out.println("Key: "+data.key()+" and value: "+data.value());
System.out.println("Topic: "+data.topic());
System.out.println("Partition: "+data.partition());
}
The same issue was happening when I was using console to consume the messages using the command :
kafka-console-consumer.bat --bootstrap-server 127.0.0.1:9092 --from-beginning --topic first_topic --group my-application
I had to use --partition 0 option explicitly to get the messages.
Is there a way I can fix this ?

Use any of the following property in the code-
`properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest")
It tells the consumer to read only the latest messages, that is, the messages which were published after the consumer started.
`properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
When a consumer starts to consume the messages from the partition, it will start to read from the beginning.
Use the following command
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
----group my-application it should be --group my-application
https://kafka.apache.org/quickstart
Please follow the below steps.
a. Get the partitions of a topic
./kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic test
Topic: test PartitionCount: 1 ReplicationFactor: 1 Configs: segment.bytes=1073741824
Topic: test Partition: 0 Leader: 0 Replicas: 0 Isr: 0
b. Get the offset of the topic.
/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic test --time -1
test:0:11
c. consume the messages of a partition
/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --partition 0
You can also specify an --offset parameter that indicates which offset to start from. If absent, the consumption starts at the end of the partition.
You can also use an GUI based tool to view data in each partition of a topic named kafka tool. http://www.kafkatool.com It’s a tool to manage our kafka cluster.
Java Code
TopicPartition partition0 = new TopicPartition(topic, 0);
TopicPartition partition1 = new TopicPartition(topic, 1);
consumer.assign(Arrays.asList(partition0, partition1));
You can get the detailed implementation using java at the following URL.
https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
https://www.logicbig.com/tutorials/misc/kafka/manually-assign-partition-to-a-consumer.html

kafka consumer try to connect to random hostname instead right one

I'm new to Kafka and started exploring with sample program. It used to work without any issue but all of sudden consumer.poll() command hangs and never returns. Googling suggested to check the servers are accessible. Producer and Consumer java code runs in same machine, where producer able to post record to Kafka, but consumer poll method hangs.
Environment:
Kafka version: 1.1.0
Client: Java
Runs in Ubuntu docker container inside windows
Zookeeper and 2 Broker servers runs in same container
When I have enabled logging for client code, I see below exception:
2018-07-06 21:24:18 DEBUG NetworkClient:802 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Error connecting to node 4bdce773eb74:9095 (id: 2 rack: null)
java.io.IOException: Can't resolve address: 4bdce773eb74:9095
at org.apache.kafka.common.network.Selector.doConnect(Selector.java:235)
at org.apache.kafka.common.network.Selector.connect(Selector.java:214)
.................
.................
I'm not sure why consumer trying to connect to 4bdce773eb74 even though my broker servers are 192.168.99.100:9094,192.168.99.100:9095. And my full consumer code:
final String BOOTSTRAP_SERVERS = "192.168.99.100:9094,192.168.99.100:9095";
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ConsumerConfig.GROUP_ID_CONFIG, "Event_Consumer");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
KafkaConsumer<Long, String> consumer = new KafkaConsumer<Long, String>(props);
TopicPartition tpLogin = new TopicPartition("login1", 0);
TopicPartition tpLogout = new TopicPartition("logout1", 1);
List<TopicPartition> tps = Arrays.asList(tpLogin, tpLogout);
consumer.assign(tps);
while (true) {
final ConsumerRecords<Long, String> consumerRecords = consumer.poll(1000);
if (consumerRecords.count()==0) {
continue;
}
consumerRecords.forEach(record -> {
System.out.printf("Consumer Record:(%d, %s, %d, %d)\n", record.key(), record.value(),
record.partition(), record.offset());
});
consumer.commitAsync();
Thread.sleep(5000);
}
}
Please help in this issue.
EDIT
As I said earlier, I have 2 brokers, say broker-1 and broker-2. If I stop broker-1, then above exception is not logged, but still poll() method didn't returns.
Below message logged indefinitely, if I stop broker-1:
2018-07-07 11:31:24 DEBUG AbstractCoordinator:579 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Sending FindCoordinator request to broker 192.168.99.100:9094 (id: 1 rack: null)
2018-07-07 11:31:24 DEBUG AbstractCoordinator:590 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Received FindCoordinator response ClientResponse(receivedTimeMs=1530943284196, latencyMs=2, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, clientId=consumer-1, correlationId=573), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null)))
2018-07-07 11:31:24 DEBUG AbstractCoordinator:613 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Group coordinator lookup failed: The coordinator is not available.
2018-07-07 11:31:24 DEBUG AbstractCoordinator:227 - [Consumer clientId=consumer-1, groupId=IDCS_Audit_Event_Consumer] Coordinator discovery failed, refreshing metadata
Thanks in Advance,
Soman

I found the issue. When I'm creating topic, broker-0(runs on port:9093; broker id:0) and broker-2(runs on port:9094; broker id:2) was running. Today I have mistakenly started broker-1(runs on port:9095; broker id:1) and broker-2. After stopping broker-1 and starting broker-0, resolves the issue. Now consumer able to get the events.
Definitely human error from my side, but I have 2 comments:
I think Kafka should gracefully use broker-2(port no:9094) and ignore broker-1(port no:9095)
why Kafka trying to contact 4bdce773eb74:9095, instead of right IP address(192.168.99.100)?
thanks.

Apache Kafka Cluster start fails with NoNodeException

I'm trying to start a spark streaming session which consumes from a Kafka queue and I'm using Zookeeper for config mgt. However, when I try to start this following exception is being thrown.
18/03/26 09:25:49 INFO ZookeeperConnection: Checking Kafka topic core-data-tickets does exists ...
18/03/26 09:25:49 INFO Broker: Kafka topic core-data-tickets exists
18/03/26 09:25:49 INFO Broker: Processing topic : core-data-tickets
18/03/26 09:25:49 WARN ZookeeperConnection: Resetting Topic Offset
org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/clt/offsets/core-data-tickets/4
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
at kafka.utils.ZkUtils$.readData(ZkUtils.scala:443)
at kafka.utils.ZkUtils.readData(ZkUtils.scala)
at net.core.data.connection.ZookeeperConnection.readTopicPartitionOffset(ZookeeperConnection.java:145)
I have already created the relevant Kafka topic.
Any insights on this would be highly appreciated.
#
I'm using the following code to run the spark job
spark-submit --class net.core.data.compute.Broker --executor-memory 512M --total-executor-cores 2 --driver-java-options "-Dproperties.path=/ebs/tmp/continuous-loading-tool/continuous-loading-tool/src/main/resources/dev.properties" --conf spark.ui.port=4045 /ebs/tmp/dev/data/continuous-loading-tool/target/continuous-loading-tool-1.0-SNAPSHOT.jar

I guess that this error has to do with offsets retention. By default, offsets are stored for only 1440 minutes (i.e. 24 hours). Therefore, if the group has not committed offsets within a day, Kafka won't have information about it.
A possible workaround is to set the value of offsets.retention.minutes accordingly.
offsets.retention.minutes
Offsets older than this retention period will be discarded

How to make Kafka broker failover to work regarding consumers?

It seems very complicated to make a replicated broker work regarding consumers: it seems when stopping certain brokers, some consumers don't work anymore and, when the specific broker is up again, those consumers that didn't work receive all the "missing" messages.
I am using a 2 brokers scenario. Created a replicated topic like this:
$KAFKA_HOME/bin/kafka-topics.sh --create \
--zookeeper localhost:2181 \
--replication-factor 2 \
--partitions 3 \
--topic replicated_topic
The excerpt from the server config looks like this ( notice it is the same for both servers except port, broker id and log dir):
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3
# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/tmp/kafka-logs0
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=localhost:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
Let's decribe my topic using 2 brokers:
Topic:replicated_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: replicated_topic Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0
Topic: replicated_topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 1,0
Topic: replicated_topic Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0
Let's see the code for the consumer:
Consumer ( impl Callable )
#Override
public Void call() throws Exception {
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
bootstrapServers);
props.put(ConsumerConfig.GROUP_ID_CONFIG,
groupId);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
IntegerDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
final Consumer<Integer, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(topicName));
ConsumerRecords<Integer, String> records = null;
while (!Thread.currentThread().isInterrupted()) {
records = consumer.poll(1000);
if (records.isEmpty()) {
continue;
}
records.forEach(rec -> LOGGER.debug("{}#{} consumed from topic {}, partition {} pair ({},{})",
ConsumerCallable.class.getSimpleName(), hashCode(), rec.topic(), rec.partition(), rec.key(), rec.value()));
consumer.commitAsync();
}
consumer.close();
return null;
}
Producer and main code:
private static final String TOPIC_NAME = "replicated_topic";
private static final String BOOTSTRAP_SERVERS = "localhost:9092, localhost:9093";
private static final Logger LOGGER = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) {
ExecutorService executor = Executors.newCachedThreadPool();
executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group1"));
executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group2"));
executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group3"));
try (Producer<Integer, String> producer = createProducer()) {
Scanner scanner = new Scanner(System.in);
String line = null;
LOGGER.debug("Please enter 'k v' on the command line to send to Kafka or 'quit' to exit");
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.trim().toLowerCase().equals("quit")) {
break;
}
String[] elements = line.split(" ");
Integer key = Integer.parseInt(elements[0]);
String value = elements[1];
producer.send(new ProducerRecord<>(TOPIC_NAME, key, value));
producer.flush();
}
}
executor.shutdownNow();
}
private static Producer<Integer, String> createProducer() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
BOOTSTRAP_SERVERS);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "KafkaExampleProducer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
IntegerSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
StringSerializer.class.getName());
return new KafkaProducer<>(props);
}
Now let's see the behaviour:
All brokers are up:
Output of kafka topic:
Topic:replicated_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: replicated_topic Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0
Topic: replicated_topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 1,0
Topic: replicated_topic Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0
Output of program:
12:52:30.460 DEBUG Main - Please enter 'k v' on the command line to send to Kafka or 'quit' to exit
1 u
12:52:35.555 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 0 pair (1,u)
12:52:35.559 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 0 pair (1,u)
12:52:35.559 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 0 pair (1,u)
2 d
12:52:38.096 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 2 pair (2,d)
12:52:38.098 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 2 pair (2,d)
12:52:38.100 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 2 pair (2,d)
Since the consumers are in different groups all messages are broadcasted to them, everything is ok.
2 Bring down broker 2:
Describe topic:
Topic:replicated_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: replicated_topic Partition: 0 Leader: 0 Replicas: 1,0 Isr: 0
Topic: replicated_topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0
Topic: replicated_topic Partition: 2 Leader: 0 Replicas: 1,0 Isr: 0
Output of program:
3 t
12:57:03.898 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 1 pair (3,t)
4 p
12:57:06.058 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 1 pair (4,p)
Now only 1 consumer receives data. Let's bring up broker 2 again:
Now the other 2 consumers receive the missing data:
12:57:50.863 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 1 pair (3,t)
12:57:50.863 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 1 pair (4,p)
12:57:50.870 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 1 pair (3,t)
12:57:50.870 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 1 pair (4,p)
Bring down broker 1:
Now only 2 consumers receive data:
5 c
12:59:13.718 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 2 pair (5,c)
12:59:13.737 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 2 pair (5,c)
6 s
12:59:16.437 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 2 pair (6,s)
12:59:16.438 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 2 pair (6,s)
If I bring it on the other consumer wil also receive missing data.
My point guys ( sorry for big write but I am trying to capture the context ), is how to make sure that no matter what broker I would stop, the consumers would work correctly? ( receive all messages normally )?
PS: I tried setting the offsets.topic.replication.factor=2 or 3, but it didn't have any effect.

Messages to that broker will not be ignored if the no. of alive brokers is lesser than the configured replicas. Whenever a new Kafka broker joins the cluster, the data gets replicated to that node. https://stackoverflow.com/a/38998062/6274525
So when your broker 2 goes down, the messages still get pushed to another alive broker because there is 1 live broker and replication factor is 2. Since your other 2 consumers are subscribed to broker 2 (which is down), they are unable to consume.
When your broker 2 is up again, the data gets duplicated to this new node and hence the consumers attached to this node receive the message (referred by you as "missing" messages).

Please make sure you have changed the property called offsets.topic.replication.factor to atleast 3.
This property is used to manage offset and consumer interaction. When a kafka server is started, it auto creates a topic with name __consumer_offsets. So if the replicas are not created in this topic, then a consumer cannot know for sure if something has been pushed to the Topic it was listening to.
Link to detail of this property : https://kafka.apache.org/documentation/#brokerconfigs

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.