KTable causes unsubscribe from topics - java

I'm writing a basic Kafka streams app in Java which reads wikipedia events provided by a producer and attempts to count the amount of created and recently changed events according to user type (bot or human).
I created a custom serdes for the wikipedia events and am able to successfully print both the created and modified events to the screen from my KStreams.
My next step was to create a KTable in which I will count the created events per user type.
It seems that after the KTable is created the rest of the code does not execute.
I don't get an error message and my app seems to be running, but nothing is printed and maybe not even processed.
My code is as following:
StreamsBuilder builder = new StreamsBuilder();
KStream<String, WikiEvent> allEvents =
builder.stream(topicList, Consumed.with(Serdes.String(), WikiEventSerdes.WikiEvent()));
KStream<String, WikiEvent> createEvents = allEvents.filter((key, value) -> value.getStream().equals("create"));
KStream<String, WikiEvent> changeEvents = allEvents.filter((key, value) -> value.getStream().equals("change"));
createEvents.foreach((k,v)->System.out.println("p2 Key= " + k + " Value=" + v.getStream()));
KTable<String, Long> createdPagesUserTypeTable = createEvents.groupBy((key, value) -> value.getUserType()).count();
KStream<String, Long> tableStream = createdPagesUserTypeTable.toStream();
tableStream.foreach((k,v)->System.out.println("Key= " + k + " Value=" + v));
The reason I suspect that nothing executes past the KTable is because the print of the createEvents stream never happens when the KTable definition is present.
Once I remove all lines from the KTable down, I get the prints.
What's gone wrong here?
Also, is there a log of some sort where I can see the execution of my code?
An update:
After looking at the server logs I see this when defining the KTable:
[2022-05-27 19:58:38,983] INFO [GroupCoordinator 0]: Dynamic member with unknown member id joins group streams-wiki in Empty state. Created a new member id streams-wiki-8ea96db7-0052-421a-b7c0-a56cedf9f43e-StreamThread-1-consumer-aa4e311e-2712-4054-ac59-9b56f13d2231 and request the member to rejoin with this id. (kafka.coordinator.group.GroupCoordinator)
[2022-05-27 19:58:38,995] INFO [GroupCoordinator 0]: Preparing to rebalance group streams-wiki in state PreparingRebalance with old generation 2 (__consumer_offsets-22) (reason: Adding new member streams-wiki-8ea96db7-0052-421a-b7c0-a56cedf9f43e-StreamThread-1-consumer-aa4e311e-2712-4054-ac59-9b56f13d2231 with group instance id None; client reason: rebalance failed due to 'The group member needs to have a valid member id before actually entering a consumer group.' (MemberIdRequiredException)) (kafka.coordinator.group.GroupCoordinator)
[2022-05-27 19:58:38,999] INFO [GroupCoordinator 0]: Stabilized group streams-wiki generation 3 (__consumer_offsets-22) with 1 members (kafka.coordinator.group.GroupCoordinator)
[2022-05-27 19:58:39,274] INFO [GroupCoordinator 0]: Assignment received from leader streams-wiki-8ea96db7-0052-421a-b7c0-a56cedf9f43e-StreamThread-1-consumer-aa4e311e-2712-4054-ac59-9b56f13d2231 for group streams-wiki for generation 3. The group has 1 members, 0 of which are static. (kafka.coordinator.group.GroupCoordinator)
[2022-05-27 19:58:39,934] INFO [GroupCoordinator 0]: Preparing to rebalance group streams-wiki in state PreparingRebalance with old generation 3 (__consumer_offsets-22) (reason: Removing member streams-wiki-8ea96db7-0052-421a-b7c0-a56cedf9f43e-StreamThread-1-consumer-aa4e311e-2712-4054-ac59-9b56f13d2231 on LeaveGroup; client reason: the consumer unsubscribed from all topics) (kafka.coordinator.group.GroupCoordinator)
[2022-05-27 19:58:39,934] INFO [GroupCoordinator 0]: Group streams-wiki with generation 4 is now empty (__consumer_offsets-22) (kafka.coordinator.group.GroupCoordinator)
[2022-05-27 19:58:39,938] INFO [GroupCoordinator 0]: Member MemberMetadata(memberId=streams-wiki-8ea96db7-0052-421a-b7c0-a56cedf9f43e-StreamThread-1-consumer-aa4e311e-2712-4054-ac59-9b56f13d2231, groupInstanceId=None, clientId=streams-wiki-8ea96db7-0052-421a-b7c0-a56cedf9f43e-StreamThread-1-consumer, clientHost=/127.0.0.1, sessionTimeoutMs=45000, rebalanceTimeoutMs=300000, supportedProtocols=List(stream)) has left group streams-wiki through explicit `LeaveGroup`; client reason: the consumer unsubscribed from all topics (kafka.coordinator.group.GroupCoordinator)
so it appears that my KTable has somehow caused an unsubscribe from all topics.
Any idea why this is happening?

In the end it turned out that my java consumer was failing due to a missing StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG definition.
The way to understand this was by adding the following lines of code after
KafkaStreams streams = new KafkaStreams(topology, props); :
streams.setUncaughtExceptionHandler((Thread t, Throwable e) -> {
System.out.println(e);
});
This will output debug to the command window and show additional logs for the consumer.

Related

io.smallrye.mutiny.TimeoutException when using kafka vs redis

I'm using kafka + redis in my project.
I get message from Kafka, process and save to redis, but it is giving error like below when my code runs after some time my code
io.smallrye.mutiny.TimeoutException
at io.smallrye.mutiny.operators.uni.UniBlockingAwait.await(UniBlockingAwait.java:64)
at io.smallrye.mutiny.groups.UniAwait.atMost(UniAwait.java:65)
at io.quarkus.redis.client.runtime.RedisClientImpl.await(RedisClientImpl.java:1046)
at io.quarkus.redis.client.runtime.RedisClientImpl.set(RedisClientImpl.java:687)
at worker.redis.process.implementation.ProductImplementation.refresh(ProductImplementation.java:34)
at worker.redis.Worker.refresh(Worker.java:51)
at
kafka.InComingProductKafkaConsume.lambda$consume$0(InComingProductKafkaConsume.java:38)
at business.core.hpithead.ThreadStart.doRun(ThreadStart.java:34)
at business.core.hpithead.core.NotifyingThread.run(NotifyingThread.java:27)
at java.base/java.lang.Thread.run(Thread.java:833)
The record 51761 from topic-partition 'mer-outgoing-master-item-0' has waited for 153 seconds to be acknowledged. This waiting time is greater than the configured threshold (150000 ms). At the moment 2 messages from this partition are awaiting acknowledgement. The last committed offset for this partition was 51760. This error is due to a potential issue in the application which does not acknowledged the records in a timely fashion. The connector cannot commit as a record processing has not completed.
#Incoming("mer_product")
#Blocking
public CompletionStage<Void> consume2(Message<String> payload) {
var objectDto = configThreadLocal.mapper.readValue(payload.getPayload(), new TypeReference<KafkaPayload<ItemKO>>(){});
worker.refresh(objectDto.payload.castDto());
return payload.ack();
}

The messages are not getting deleted from the file system when deleteRecords Kafka Admin Client Java API is invoked

I was trying to delete messages from my kafka topic using Java Admin Client API's delete Records method. Following are the steps that i have tried
1. I pushed 20000 records to my TEST-DELETE topic
2. Started a console consumer and consumed all the messages
3. Invoked my java program to delete all those 20k messages
4. Started another console consumer with a different group id. This consumer is not receiving any of the deleted messages
When I checked the file system, I could still see all those 20k records occupying the disk space. My intention is to delete those records forever from file system too.
My Topic configuration is given below along with server.properties settings
Topic:TEST-DELETE PartitionCount:4 ReplicationFactor:1 Configs:cleanup.policy=delete
Topic: TEST-DELETE Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Topic: TEST-DELETE Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: TEST-DELETE Partition: 2 Leader: 0 Replicas: 0 Isr: 0
Topic: TEST-DELETE Partition: 3 Leader: 0 Replicas: 0 Isr: 0
log.retention.hours=24
log.retention.check.interval.ms=60000
log.cleaner.delete.retention.ms=60000
file.delete.delay.ms=60000
delete.retention.ms=60000
offsets.retention.minutes=5
offsets.retention.check.interval.ms=60000
log.cleaner.enable=true
log.cleanup.policy=compact,delete
My delete code is given below
public void deleteRecords(Map<String, Map<Integer, Long>> allTopicPartions) {
Map<TopicPartition, RecordsToDelete> recordsToDelete = new HashMap<>();
allTopicPartions.entrySet().forEach(topicDetails -> {
String topicName = topicDetails.getKey();
Map<Integer, Long> value = topicDetails.getValue();
value.entrySet().forEach(partitionDetails -> {
if (partitionDetails.getValue() != 0) {
recordsToDelete.put(new TopicPartition(topicName, partitionDetails.getKey()),
RecordsToDelete.beforeOffset(partitionDetails.getValue()));
}
});
});
DeleteRecordsResult deleteRecords = this.client.deleteRecords(recordsToDelete);
Map<TopicPartition, KafkaFuture<DeletedRecords>> lowWatermarks = deleteRecords.lowWatermarks();
lowWatermarks.entrySet().forEach(entry -> {
try {
logger.info(entry.getKey().topic() + " " + entry.getKey().partition() + " "
+ entry.getValue().get().lowWatermark());
} catch (Exception ex) {
}
});
}
The output of my java program is given below
2019-06-25 16:21:15 INFO MyKafkaAdminClient:247 - TEST-DELETE 1 5000
2019-06-25 16:21:15 INFO MyKafkaAdminClient:247 - TEST-DELETE 0 5000
2019-06-25 16:21:15 INFO MyKafkaAdminClient:247 - TEST-DELETE 3 5000
2019-06-25 16:21:15 INFO MyKafkaAdminClient:247 - TEST-DELETE 2 5000
My intention is to delete the consumed records from the file system as I am working with limited storage for my kafka broker.
I would like to get some help with my below doubts
I was in the impression that the delete Records will remove the messages from the file system too, but look like I got it wrong!!
How long those deleted records be present in the log directory?
Is there any specific configuration that i need to use in order to remove the records from the files system once the delete Records API is invoked?
Appreciate your help
Thanks
The recommended approach to handle this is to set retention.ms and related configuration values for the topics you're interested in. That way, you can define how long Kafka will store your data until it deletes it, making sure all your downstream consumers have had the chance to pull down the data before it's deleted from the Kafk cluster.
If, however, you still want to force Kafka to delete based on bytes, there's the log.retention.bytes and retention.bytes configuration values. The first one is a cluster-wide setting, the second one is the topic-specific setting, which by default takes whatever the first one is set to, but you can still override it per topic. The retention.bytes number is enforced per partition, so you should multiply it by the total number of topic partitions.
Be aware, however, that if you have a run-away producer that starts generating a lot of data suddenly, and you have it set to a hard byte limit, you might wipe out entire days worth of data in the cluster, and only be left with the last few minutes of data, maybe before even valid consumers can pull down the data from the cluster. This is why it's much better to set your kafka topics to have time-based retention, and not byte-based.
You can find the configuration properties and their explanation in the official Kafka docs: https://kafka.apache.org/documentation/

How to make Kafka broker failover to work regarding consumers?

It seems very complicated to make a replicated broker work regarding consumers: it seems when stopping certain brokers, some consumers don't work anymore and, when the specific broker is up again, those consumers that didn't work receive all the "missing" messages.
I am using a 2 brokers scenario. Created a replicated topic like this:
$KAFKA_HOME/bin/kafka-topics.sh --create \
--zookeeper localhost:2181 \
--replication-factor 2 \
--partitions 3 \
--topic replicated_topic
The excerpt from the server config looks like this ( notice it is the same for both servers except port, broker id and log dir):
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3
# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/tmp/kafka-logs0
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=localhost:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
Let's decribe my topic using 2 brokers:
Topic:replicated_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: replicated_topic Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0
Topic: replicated_topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 1,0
Topic: replicated_topic Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0
Let's see the code for the consumer:
Consumer ( impl Callable )
#Override
public Void call() throws Exception {
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
bootstrapServers);
props.put(ConsumerConfig.GROUP_ID_CONFIG,
groupId);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
IntegerDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
final Consumer<Integer, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(topicName));
ConsumerRecords<Integer, String> records = null;
while (!Thread.currentThread().isInterrupted()) {
records = consumer.poll(1000);
if (records.isEmpty()) {
continue;
}
records.forEach(rec -> LOGGER.debug("{}#{} consumed from topic {}, partition {} pair ({},{})",
ConsumerCallable.class.getSimpleName(), hashCode(), rec.topic(), rec.partition(), rec.key(), rec.value()));
consumer.commitAsync();
}
consumer.close();
return null;
}
Producer and main code:
private static final String TOPIC_NAME = "replicated_topic";
private static final String BOOTSTRAP_SERVERS = "localhost:9092, localhost:9093";
private static final Logger LOGGER = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) {
ExecutorService executor = Executors.newCachedThreadPool();
executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group1"));
executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group2"));
executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group3"));
try (Producer<Integer, String> producer = createProducer()) {
Scanner scanner = new Scanner(System.in);
String line = null;
LOGGER.debug("Please enter 'k v' on the command line to send to Kafka or 'quit' to exit");
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.trim().toLowerCase().equals("quit")) {
break;
}
String[] elements = line.split(" ");
Integer key = Integer.parseInt(elements[0]);
String value = elements[1];
producer.send(new ProducerRecord<>(TOPIC_NAME, key, value));
producer.flush();
}
}
executor.shutdownNow();
}
private static Producer<Integer, String> createProducer() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
BOOTSTRAP_SERVERS);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "KafkaExampleProducer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
IntegerSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
StringSerializer.class.getName());
return new KafkaProducer<>(props);
}
Now let's see the behaviour:
All brokers are up:
Output of kafka topic:
Topic:replicated_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: replicated_topic Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0
Topic: replicated_topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 1,0
Topic: replicated_topic Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0
Output of program:
12:52:30.460 DEBUG Main - Please enter 'k v' on the command line to send to Kafka or 'quit' to exit
1 u
12:52:35.555 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 0 pair (1,u)
12:52:35.559 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 0 pair (1,u)
12:52:35.559 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 0 pair (1,u)
2 d
12:52:38.096 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 2 pair (2,d)
12:52:38.098 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 2 pair (2,d)
12:52:38.100 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 2 pair (2,d)
Since the consumers are in different groups all messages are broadcasted to them, everything is ok.
2 Bring down broker 2:
Describe topic:
Topic:replicated_topic PartitionCount:3 ReplicationFactor:2 Configs:
Topic: replicated_topic Partition: 0 Leader: 0 Replicas: 1,0 Isr: 0
Topic: replicated_topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0
Topic: replicated_topic Partition: 2 Leader: 0 Replicas: 1,0 Isr: 0
Output of program:
3 t
12:57:03.898 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 1 pair (3,t)
4 p
12:57:06.058 DEBUG ConsumerCallable - ConsumerCallable#186743616 consumed from topic replicated_topic, partition 1 pair (4,p)
Now only 1 consumer receives data. Let's bring up broker 2 again:
Now the other 2 consumers receive the missing data:
12:57:50.863 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 1 pair (3,t)
12:57:50.863 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 1 pair (4,p)
12:57:50.870 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 1 pair (3,t)
12:57:50.870 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 1 pair (4,p)
Bring down broker 1:
Now only 2 consumers receive data:
5 c
12:59:13.718 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 2 pair (5,c)
12:59:13.737 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 2 pair (5,c)
6 s
12:59:16.437 DEBUG ConsumerCallable - ConsumerCallable#1361430455 consumed from topic replicated_topic, partition 2 pair (6,s)
12:59:16.438 DEBUG ConsumerCallable - ConsumerCallable#1241910294 consumed from topic replicated_topic, partition 2 pair (6,s)
If I bring it on the other consumer wil also receive missing data.
My point guys ( sorry for big write but I am trying to capture the context ), is how to make sure that no matter what broker I would stop, the consumers would work correctly? ( receive all messages normally )?
PS: I tried setting the offsets.topic.replication.factor=2 or 3, but it didn't have any effect.
Messages to that broker will not be ignored if the no. of alive brokers is lesser than the configured replicas. Whenever a new Kafka broker joins the cluster, the data gets replicated to that node. https://stackoverflow.com/a/38998062/6274525
So when your broker 2 goes down, the messages still get pushed to another alive broker because there is 1 live broker and replication factor is 2. Since your other 2 consumers are subscribed to broker 2 (which is down), they are unable to consume.
When your broker 2 is up again, the data gets duplicated to this new node and hence the consumers attached to this node receive the message (referred by you as "missing" messages).
Please make sure you have changed the property called offsets.topic.replication.factor to atleast 3.
This property is used to manage offset and consumer interaction. When a kafka server is started, it auto creates a topic with name __consumer_offsets. So if the replicas are not created in this topic, then a consumer cannot know for sure if something has been pushed to the Topic it was listening to.
Link to detail of this property : https://kafka.apache.org/documentation/#brokerconfigs

Read 1 message concurrently from multiple Kafka topics

I set the concurrency as 1 for my Kafka Listener.
ConcurrentKafkaListenerContainerFactory<String, Map<String, Object>>
factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConcurrency(conncurrency);
factory.setConsumerFactory(consumerFactory());
factory.setRetryTemplate(retryTemplate());
I am listening to 3 different topics
#KafkaListener(topics = "#{'${kafka.consumer.topic.name}'.split(',')}", containerFactory = "kafkaListenerContainerFactory")
public void listen(#Payload Map<String, Object> conciseMap,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition,
#Header(KafkaHeaders.OFFSET) int offset,
Acknowledgment ack) {
processMessage(conciseMap,partition,offset,ack,false);
}
In this case , will the listener read one message from the first topic & once it is processed read 1 message from next topic and so on? Or will it concurrently process 1 message from each topic.
If it is former , is there a way to read 1 message concurrently from all the topics without creating multiple listeners?
There is no guarantee how the Kafka broker will allocate the partitions across the container threads; if you only have one partition; they will probably all be allocated to the same container thread. That's what just happened when I ran a test with container concurrency=3...
2017-10-31 16:40:26.066 INFO 35202 --- [ntainer#0-2-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[]
2017-10-31 16:40:26.066 INFO 35202 --- [ntainer#0-1-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[]
2017-10-31 16:40:26.079 INFO 35202 --- [ntainer#0-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[bar-0, foo-0, baz-0]
With 10 partitions per topic, I got this distribution...
2017-10-31 16:46:19.279 INFO 35900 --- [ntainer#0-1-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[foo10-5, foo10-6, foo10-4, baz10-5, baz10-4, baz10-6, bar10-5, bar10-4, bar10-6]
2017-10-31 16:46:19.279 INFO 35900 --- [ntainer#0-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[bar10-1, bar10-0, bar10-3, bar10-2, baz10-1, baz10-0, baz10-3, baz10-2, foo10-3, foo10-1, foo10-2, foo10-0]
2017-10-31 16:46:19.279 INFO 35900 --- [ntainer#0-2-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[baz10-9, baz10-8, baz10-7, bar10-9, bar10-8, foo10-9, bar10-7, foo10-7, foo10-8]
As you can see, some partitions from each topic were allocated to each thread. But two of the threads got 9 partitions total while one got 12.
If you want complete control, I would suggest a listener per topic.
You don't need to create multiple listeners - you only need as big concurrency as much partitions you have throughout all the topics provides or even more.
There will be just spinned such an amount of KafkaMessageListenerContainer and each of them will work in its own thread. You still are able to use the same #KafkaListener method. As long as you are stateless there, you don't have any problem with the concurrency.

Storm KafkaSpout stopped to consume messages from Kafka Topic

My problem is that Storm KafkaSpout stopped to consume messages from Kafka topic after a period of time. When debug is enabled in storm, I get the log file like this:
2016-07-05 03:58:26.097 o.a.s.d.task [INFO] Emitting: packet_spout __metrics [#object[org.apache.storm.metric.api.IMetricsConsumer$TaskInfo 0x2c35b34f "org.apache.storm.metric.api.IMetricsConsumer$TaskInfo#2c35b34f"] [#object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x798f1e35 "[__ack-count = {default=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x230867ec "[__sendqueue = {sojourn_time_ms=0.0, write_pos=5411461, read_pos=5411461, overflow=0, arrival_rate_secs=0.0, capacity=1024, population=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x7cdec8eb "[__complete-latency = {default=0.0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x658fc59 "[__skipped-max-spout = 0]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x3c1f3a50 "[__receive = {sojourn_time_ms=4790.5, write_pos=2468305, read_pos=2468304, overflow=0, arrival_rate_secs=0.20874647740319383, capacity=1024, population=1}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x262d7906 "[__skipped-inactive = 0]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x73648c7e "[kafkaPartition = {Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPICallCount=0, Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPILatencyMax=null, Partition{host=slave103:9092, topic=packet, partition=12}/lostMessageCount=0, Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPILatencyMean=null, Partition{host=slave103:9092, topic=packet, partition=12}/fetchAPIMessageCount=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x4e43df61 "[kafkaOffset = {packet/totalLatestCompletedOffset=154305947, packet/partition_12/spoutLag=82472754, packet/totalEarliestTimeOffset=233919465, packet/partition_12/earliestTimeOffset=233919465, packet/partition_12/latestEmittedOffset=154307691, packet/partition_12/latestTimeOffset=236778701, packet/totalLatestEmittedOffset=154307691, packet/partition_12/latestCompletedOffset=154305947, packet/totalLatestTimeOffset=236778701, packet/totalSpoutLag=82472754}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x49fe816b "[__transfer-count = {__ack_init=0, default=0, __metrics=0}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x63e2bdc0 "[__fail-count = {}]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x3b17bb7b "[__skipped-throttle = 1086120]"] #object[org.apache.storm.metric.api.IMetricsConsumer$DataPoint 0x1315a68c "[__emit-count = {__ack_init=0, default=0, __metrics=0}]"]]]
2016-07-05 03:58:55.042 o.a.s.d.executor [INFO] Processing received message FOR -2 TUPLE: source: __system:-1, stream: __tick, id: {}, [30]
2016-07-05 03:59:25.042 o.a.s.d.executor [INFO] Processing received message FOR -2 TUPLE: source: __system:-1, stream: __tick, id: {}, [30]
2016-07-05 03:59:25.946 o.a.s.d.executor [INFO] Processing received message FOR -2 TUPLE: source: __system:-1, stream: __metrics_tick, id: {}, [60]
My test topology is really simple, One KafkaSpout and another Counter Bolt. When the topology works fine, the value between FOR and TUPLE is a positive number; when the topology stops to consume the message, the value becomes negative. so I'm curious about what causes the problem of Processing received message FOR -2 TUPLE, and how to fix this problem?
By the way, my experiment environment is:
OS: Red Hat Enterprise Linux Server release 7.0 (Maipo)
Kafka: 0.10.0.0
Storm: 1.0.1
With the help from the stom mail list I was able to tune KafkaSpout and resolve the issue. The following settings work for me.
config.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2048);
config.put(Config.TOPOLOGY_BACKPRESSURE_ENABLE, false);
config.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);
config.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384);
I tested by sending 20k-50k batches with 1sec pause between bursts. Each message was 2048 bytes.
I am running 3 node cluster, my topology has 4 spouts and topic has 64 partitions.
After 200M messages its still working....
Check if the producer is actually writing to the topic you expect.
Make sure that the spouts can reach Kafka, at the network level. You can check it using Telnet command.
Can spouts reach Zookeeper? Check it again using Telnet.
Source: KafkaSpout is not receiving anything from Kafka
If above three are true, then:
Kafka has fixed retention window for topics. If the retention is full, it will drop the messages from the tail.
So here what 'might' be happening : the rate at which you are pushing the data to kafka is faster than the rate at which the consumers can consume the messages.
Source : Storm-kafka spout not fast enough to process the information

Categories