How to make Kafka Source reconnect when Kafka restarts - java

I create a Source of consumer records using Reactive Kafka as follows:
val settings = ConsumerSettings(system, keyDeserializer, valueDeserializer)
.withBootstrapServers(bootstrapServers)
.withGroupId(groupName)
// what offset to begin with if there's no offset for this group
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
// do we want to automatically commit offsets?
.withProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true")
// auto-commit offsets every 1 minute, in the background
.withProperty(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000")
// reconnect every 1 second, when disconnected
.withProperty(ConsumerConfig.RECONNECT_BACKOFF_MS_CONFIG, "1000")
// every session lasts 30 seconds
.withProperty(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "30000")
// send heartbeat every 10 seconds i.e. 1/3 * session.timeout.ms
.withProperty(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, "10000")
// how many records to fetch in each poll( )
.withProperty(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "100")
Consumer.atMostOnceSource(settings, Subscriptions.topics(topic)).map(_.value)
I have 1 instance of Kafka running on my local machine. I push values into the topic via the console producer and see them printed out. Then I kill Kafka, and restart it to see if the source reconnects.
These are how my logs proceed:
* Connection with /192.168.0.1 disconnected
java.net.ConnectException: Connection refused
* Give up sending metadata request since no node is available
* Consumer interrupted with WakeupException after timeout. Message: null. Current value of akka.kafka.consumer.wakeup-timeout is 3000 milliseconds
* Resuming partition test-events-0
* Error while fetching metadata with correlation id 139 : {test-events=INVALID_REPLICATION_FACTOR}
* Sending metadata request (type=MetadataRequest, topics=test-events) to node 0
* Sending GroupCoordinator request for group mytestgroup to broker 192.168.0.1:9092 (id: 0 rack: null)
* Consumer interrupted with WakeupException after timeout. Message: null. Current value of akka.kafka.consumer.wakeup-timeout is 3000 milliseconds
* Received GroupCoordinator response ClientResponse(receivedTimeMs=1491797713078, latencyMs=70, disconnected=false, requestHeader={api_key=10,api_version=0,correlation_id=166,client_id=consumer-1}, responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) for group mytestgroup
* Error while fetching metadata with correlation id 169 : {test-events=INVALID_REPLICATION_FACTOR}
* Received GroupCoordinator response ClientResponse(receivedTimeMs=1491797716169, latencyMs=72, disconnected=false, requestHeader={api_key=10,api_version=0,correlation_id=196,client_id=consumer-1}, responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) for group mytestgroup
09:45:16.169 [testsystem-akka.kafka.default-dispatcher-16] DEBUG o.a.k.c.c.i.AbstractCoordinator - Group coordinator lookup for group mytestgroup failed: The group coordinator is not available.
09:45:16.169 [testsystem-akka.kafka.default-dispatcher-16] DEBUG o.a.k.c.c.i.AbstractCoordinator - Coordinator discovery failed for group mytestgroup, refreshing metadata
* Initiating API versions fetch from node 2147483647
* Offset commit for group mytestgroup failed: This is not the correct coordinator for this group.
* Marking the coordinator 192.168.43.25:9092 (id: 2147483647 rack: null) dead for group mytestgroup
* The Kafka consumer has closed.
How do I make sure that this Source reconnects and continues processing the logs?

I think you need to have at least 2 brokers. If one fails the other one can do the job and you could restart the other one.

Related

io.smallrye.mutiny.TimeoutException when using kafka vs redis

I'm using kafka + redis in my project.
I get message from Kafka, process and save to redis, but it is giving error like below when my code runs after some time my code
io.smallrye.mutiny.TimeoutException
at io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:64)
at io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65)
at io.quarkus.redis.client.runtime.RedisClientImpl.await(RedisClientImpl.java:1046)
at io.quarkus.redis.client.runtime.RedisClientImpl.set(RedisClientImpl.java:687)
at worker.redis.process.implementation.ProductImplementation.refresh(ProductImplementation.java:34)
at worker.redis.Worker.refresh(Worker.java:51)
at
kafka.InComingProductKafkaConsume.lambda$consume$0(InComingProductKafkaConsume.java:38)
at business.core.hpithead.ThreadStart.doRun(ThreadStart.java:34)
at business.core.hpithead.core.NotifyingThread.run(NotifyingThread.java:27)
at java.base/java.lang.Thread.run(Thread.java:833)
The record 51761 from topic-partition 'mer-outgoing-master-item-0' has waited for 153 seconds to be acknowledged. This waiting time is greater than the configured threshold (150000 ms). At the moment 2 messages from this partition are awaiting acknowledgement. The last committed offset for this partition was 51760. This error is due to a potential issue in the application which does not acknowledged the records in a timely fashion. The connector cannot commit as a record processing has not completed.
#Incoming("mer_product")
#Blocking
public CompletionStage<Void> consume2(Message<String> payload) {
var objectDto = configThreadLocal.mapper.readValue(payload.getPayload(), new TypeReference<KafkaPayload<ItemKO>>(){});
worker.refresh(objectDto.payload.castDto());
return payload.ack();
}

Is there a "Circuit Breaker" for Spring Boot Kafka client?

In case that Kafka server is (temporarily) down, my Spring Boot application ReactiveKafkaConsumerTemplate keeps trying to connect unsuccessfully, thus causing unnecessary traffic and messing the log files:
2021-11-10 14:45:30.265 WARN 24984 --- [onsumer-group-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-group-1, groupId=consumer-group] Connection to node -1 (localhost/127.0.0.1:29092) could not be established. Broker may not be available.
2021-11-10 14:45:32.792 WARN 24984 --- [onsumer-group-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-group-1, groupId=consumer-group] Bootstrap broker localhost:29092 (id: -1 rack: null) disconnected
2021-11-10 14:45:34.845 WARN 24984 --- [onsumer-group-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-group-1, groupId=consumer-group] Connection to node -1 (localhost/127.0.0.1:29092) could not be established. Broker may not be available.
2021-11-10 14:45:34.845 WARN 24984 --- [onsumer-group-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-group-1, groupId=consumer-group] Bootstrap broker localhost:29092 (id: -1 rack: null) disconnected
Is it possible to use something like a circuit breaker (an inspiration here or here), so the Spring Boot Kafka client in case of a failure (or even better a few consecutive failures) slows down the pace of its connection attempts, and returns to the normal pace only after the server is up again?
Is there already a ready-made config parameter, or any other solution?
I am aware of the parameter reconnect.backoff.ms, this is how I create the ReactiveKafkaConsumerTemplate bean:
#Bean
public ReactiveKafkaConsumerTemplate<String, MyEvent> kafkaConsumer(KafkaProperties properties) {
final Map<String, Object> map = new HashMap<>(properties.buildConsumerProperties());
map.put(ConsumerConfig.GROUP_ID_CONFIG, "MyGroup");
map.put(ConsumerConfig.RECONNECT_BACKOFF_MS_CONFIG, 10_000L);
final JsonDeserializer<DisplayCurrencyEvent> jsonDeserializer = new JsonDeserializer<>();
jsonDeserializer.addTrustedPackages("com.example.myapplication");
return new ReactiveKafkaConsumerTemplate<>(
ReceiverOptions
.<String, MyEvent>create(map)
.withKeyDeserializer(new ErrorHandlingDeserializer<>(new StringDeserializer()))
.withValueDeserializer(new ErrorHandlingDeserializer<>(jsonDeserializer))
.subscription(List.of("MyTopic")));
}
And still the consumer is trying to connect every 3 seconds.
See https://kafka.apache.org/documentation/#consumerconfigs_retry.backoff.ms
The base amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all connection attempts by the client to a broker.
and https://kafka.apache.org/documentation/#consumerconfigs_reconnect.backoff.max.ms
The maximum amount of time in milliseconds to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the backoff per host will increase exponentially for each consecutive connection failure, up to this maximum. After calculating the backoff increase, 20% random jitter is added to avoid connection storms.
and

After enabling checkpoint - org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException

I have enabled checkpoint in flink 1.12.1 programmatically as below:
int duration = 10 ;
if (!environment.getCheckpointConfig().isCheckpointingEnabled()) {
environment.enableCheckpointing(duration * 6 * 1000, CheckpointingMode.EXACTLY_ONCE);
environment.getCheckpointConfig().setMinPauseBetweenCheckpoints(duration * 3 * 1000);
}
Flink Version: 1.12.1
configuration:
state.backend: rocksdb
state.checkpoints.dir: file:///flink/
blob.server.port: 6124
jobmanager.rpc.port: 6123
parallelism.default: 2
queryable-state.proxy.ports: 6125
taskmanager.numberOfTaskSlots: 2
taskmanager.rpc.port: 6122
jobmanager.memory.process.size: 1600m
taskmanager.memory.process.size: 1728m
jobmanager.web.address: 0.0.0.0
rest.address: 0.0.0.0
rest.bind-address: 0.0.0.0
rest.port: 8081
taskmanager.data.port: 6121
classloader.resolve-order: parent-first
execution.checkpointing.unaligned: false
execution.checkpointing.max-concurrent-checkpoints: 2
execution.checkpointing.interval: 60000
But it is failing with following error:
Caused by: org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: Could not fulfill slot request 44ec308e34aa86629d2034a017b8ef91. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
If I remove/disable checkpoint, everything works normally. I have checkpoint requirement because, if my pod, gets restart, data which is being handled by earlier run gets reset.
Can somebody direct, how this can be addressed?

How to make Kafka broker failover to work regarding consumers?

It seems very complicated to make a replicated broker work regarding consumers: it seems when stopping certain brokers, some consumers don't work anymore and, when the specific broker is up again, those consumers that didn't work receive all the "missing" messages.
I am using a 2 brokers scenario. Created a replicated topic like this:
$KAFKA_HOME/bin/kafka-topics.sh --create \
--zookeeper localhost:2181 \
--replication-factor 2 \
--partitions 3 \
--topic replicated_topic
The excerpt from the server config looks like this ( notice it is the same for both servers except port, broker id and log dir):
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3
# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/tmp/kafka-logs0
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=localhost:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
Let's decribe my topic using 2 brokers:
Topic:replicated_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: replicated_topic Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0
Topic: replicated_topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 1,0
Topic: replicated_topic Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0
Let's see the code for the consumer:
Consumer ( impl Callable )
#Override
public Void call() throws Exception {
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
bootstrapServers);
props.put(ConsumerConfig.GROUP_ID_CONFIG,
groupId);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
IntegerDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
final Consumer<Integer, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(topicName));
ConsumerRecords<Integer, String> records = null;
while (!Thread.currentThread().isInterrupted()) {
records = consumer.poll(1000);
if (records.isEmpty()) {
continue;
}
records.forEach(rec -> LOGGER.debug("{}#{} consumed from topic {}, partition {} pair ({},{})",
ConsumerCallable.class.getSimpleName(), hashCode(), rec.topic(), rec.partition(), rec.key(), rec.value()));
consumer.commitAsync();
}
consumer.close();
return null;
}
Producer and main code:
private static final String TOPIC_NAME = "replicated_topic";
private static final String BOOTSTRAP_SERVERS = "localhost:9092, localhost:9093";
private static final Logger LOGGER = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) {
ExecutorService executor = Executors.newCachedThreadPool();
executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group1"));
executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group2"));
executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group3"));
try (Producer<Integer, String> producer = createProducer()) {
Scanner scanner = new Scanner(System.in);
String line = null;
LOGGER.debug("Please enter 'k v' on the command line to send to Kafka or 'quit' to exit");
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.trim().toLowerCase().equals("quit")) {
break;
}
String[] elements = line.split(" ");
Integer key = Integer.parseInt(elements[0]);
String value = elements[1];
producer.send(new ProducerRecord<>(TOPIC_NAME, key, value));
producer.flush();
}
}
executor.shutdownNow();
}
private static Producer<Integer, String> createProducer() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
BOOTSTRAP_SERVERS);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "KafkaExampleProducer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
IntegerSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
StringSerializer.class.getName());
return new KafkaProducer<>(props);
}
Now let's see the behaviour:
All brokers are up:
Output of kafka topic:
Topic:replicated_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: replicated_topic Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0
Topic: replicated_topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 1,0
Topic: replicated_topic Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0
Output of program:
12:52:30.460 DEBUG Main - Please enter 'k v' on the command line to send to Kafka or 'quit' to exit
1 u
12:52:35.555 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 0 pair (1,u)
12:52:35.559 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 0 pair (1,u)
12:52:35.559 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 0 pair (1,u)
2 d
12:52:38.096 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 2 pair (2,d)
12:52:38.098 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 2 pair (2,d)
12:52:38.100 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 2 pair (2,d)
Since the consumers are in different groups all messages are broadcasted to them, everything is ok.
2 Bring down broker 2:
Describe topic:
Topic:replicated_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: replicated_topic Partition: 0 Leader: 0 Replicas: 1,0 Isr: 0
Topic: replicated_topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0
Topic: replicated_topic Partition: 2 Leader: 0 Replicas: 1,0 Isr: 0
Output of program:
3 t
12:57:03.898 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 1 pair (3,t)
4 p
12:57:06.058 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 1 pair (4,p)
Now only 1 consumer receives data. Let's bring up broker 2 again:
Now the other 2 consumers receive the missing data:
12:57:50.863 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 1 pair (3,t)
12:57:50.863 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 1 pair (4,p)
12:57:50.870 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 1 pair (3,t)
12:57:50.870 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 1 pair (4,p)
Bring down broker 1:
Now only 2 consumers receive data:
5 c
12:59:13.718 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 2 pair (5,c)
12:59:13.737 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 2 pair (5,c)
6 s
12:59:16.437 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 2 pair (6,s)
12:59:16.438 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 2 pair (6,s)
If I bring it on the other consumer wil also receive missing data.
My point guys ( sorry for big write but I am trying to capture the context ), is how to make sure that no matter what broker I would stop, the consumers would work correctly? ( receive all messages normally )?
PS: I tried setting the offsets.topic.replication.factor=2 or 3, but it didn't have any effect.
Messages to that broker will not be ignored if the no. of alive brokers is lesser than the configured replicas. Whenever a new Kafka broker joins the cluster, the data gets replicated to that node. https://stackoverflow.com/a/38998062/6274525
So when your broker 2 goes down, the messages still get pushed to another alive broker because there is 1 live broker and replication factor is 2. Since your other 2 consumers are subscribed to broker 2 (which is down), they are unable to consume.
When your broker 2 is up again, the data gets duplicated to this new node and hence the consumers attached to this node receive the message (referred by you as "missing" messages).
Please make sure you have changed the property called offsets.topic.replication.factor to atleast 3.
This property is used to manage offset and consumer interaction. When a kafka server is started, it auto creates a topic with name __consumer_offsets. So if the replicas are not created in this topic, then a consumer cannot know for sure if something has been pushed to the Topic it was listening to.
Link to detail of this property : https://kafka.apache.org/documentation/#brokerconfigs

Kafka Consumer (Java) polls 0 messages

I am seeing this issue in my Kafka Java client where the consumer stop consuming after polling a few messages. Its not that the consumer hangs. Its unable to find messages in the topic partitions and polls 0 message. I have 4 partitions configured for the topic and 2 consumers for the consumer group.
Consumer Log:
Thread-5:2016-05-11 at 07:35:21.893 UTC INFO xxxxx.KafkaConsumerClient:71 pullFromQueue polled 0 messages from topic: test:[test-0, test-1] partition : []
Thread-5:2016-05-11 at 07:35:31.893 UTC INFO xxxxx.KafkaConsumerClient:71 pullFromQueue polled 0 messages from topic: test:[test-0, test-1] partition : []
Thread-5:2016-05-11 at 07:35:41.893 UTC INFO xxxxx.KafkaConsumerClient:71 pullFromQueue polled 0 messages from topic: test:[test-0, test-1] partition : []
Thread-5:2016-05-11 at 07:35:51.893 UTC INFO xxxxx.KafkaConsumerClient:71 pullFromQueue polled 0 messages from topic: test:[test-0, test-1] partition : []
Here, the log suggests that this consumer is connected to partitions 0 and 1 but is unable to consumer any message.
Consumer Offset:
Group Topic Pid Offset logSize Lag Owner
test-consumer test 0 1147335 1150034 2699 none
test-consumer test 1 1147471 1150033 2562 none
test-consumer test 2 1150035 1150035 0 none
test-consumer test 3 1150031 1150031 0 none
Here this shows that my topic has 2699 and 2562 messages pending on partitions 0 and 1 respectively

Categories