Need to fetch messages from a Kafka topic, from a particular offset
Stuck cause of IllegalStateException exception at assign()
If I do not use assign() , then the consumer does not perform seek, as that being a Lazy operation
Actual purpose: Need to iterate messages at topic from a pre-decided offset till end. This pre-decided offset is calculated at markOffset()
static void fetchMessagesFromMarkedOffset() {
Consumer<Long, String> consumer = ConsumerCreator.createConsumer();
consumer.assign(set); // <---- Exception at this place
map.forEach((k,v) -> {
consumer.seek(k, v-3);
});
ConsumerRecords<Long, String> consumerRecords = consumer.poll(100);
consumerRecords.forEach(record -> {
System.out.println("Record Key " + record.key());
System.out.println("Record value " + record.value());
System.out.println("Record partition " + record.partition());
System.out.println("Record offset " + record.offset());
});
consumer.close();
}
Rest of concerned code involved
public static Set<TopicPartition> set;
public static Map<TopicPartition, Long> map;
static void markOffset() {
Consumer<Long, String> consumer = ConsumerCreator.createConsumer();
consumer.poll(100);
set = consumer.assignment();
map = consumer.endOffsets(set);
System.out.println("Topic Partitions: " + set);
System.out.println("End Offsets: " + map);
}
Consumer Creation
private Consumer createConsumer(String topicName) {
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "capacity-service-application");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
final Consumer consumer = new KafkaConsumer(props);
consumer.subscribe(Collections.singletonList(topicName));
return consumer;
}
Exception
Exception in thread "main" java.lang.IllegalStateException: Subscription to topics, partitions and pattern are mutually exclusive
at org.apache.kafka.clients.consumer.internals.SubscriptionState.setSubscriptionType(SubscriptionState.java:104)
at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignFromUser(SubscriptionState.java:157)
at org.apache.kafka.clients.consumer.KafkaConsumer.assign(KafkaConsumer.java:1064)
at com.gaurav.kafka.App.fetchMessagesFromMarkedOffset(App.java:44)
at com.gaurav.kafka.App.main(App.java:30)
You can't mixed manual and automatic partition assignment.
You should use KafkaConsumer::subscribe or KafkaConsumer::assign but not both.
If after calling KafkaConsumer::subscribe you want to switch to manual approach you should first call KafkaConsumer::unsubscribe.
According to https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
Note that it isn't possible to mix manual partition assignment (i.e. using assign) with dynamic partition assignment through topic subscription (i.e. using subscribe).
Related
I'm actually using #KafkaListener to read events in a topic. I want to read 100 events, and after this call Thread.sleep() with a certain time period.
My problem is, when the thread wakes up from sleep, the listener continues on the last event I read, but I want to discard the events when the thread is sleeping and continues with the last events in topic.
Like:
1-100 - Capture
Thread sleeping
101-500
Thread Returns
501 - 601 - Capture
The 101-500 events can be discarded
Code:
#KafkaListener(topics = "topic")
public void consumeBalance(ConsumerRecord<String, String> payload) throws InterruptedException {
this.valorMaximoDeRequest = this.valorMaximoDeRequest + 1;
if (this.valorMaximoDeRequest <= 100) {
log.info("Encontrou evento )");
log.info("Key: " + payload.key() + ", Value:" + payload.value());
log.info("Partition:" + payload.partition() + ",Offset:" + payload.offset());
JsonObject jsonObject = new Gson().fromJson(payload.value(), JsonObject.class);
String accountId = jsonObject.get("accountId").getAsString();
log.info(">>>>>>>>> accountId: " + accountId);
} else {
this.valorMaximoDeRequest = 0;
Thread.sleep(60*1000);
}
}
Kafka config:
#Bean
public Map<String, Object> kafkaFactory() {
Map<String, Object> props = new HashMap<>();
props.put("specific.avro.reader", Boolean.TRUE);
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "brokers");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "1");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put("security.protocol", "SASL_PLAINTEXT");
return props;
}
First, you shouldn't force the listening thread to sleep. The consumer may be considered as dead and trigger a consumer rebalance. You'd better use pause and resume on the consumer. See https://docs.spring.io/spring-kafka/docs/current/reference/html/#pause-resume
Then, if you want to skip the records published when the consumer was asleep, you'll have to seek (seekToBeginning) when the consumer awakes. See https://docs.spring.io/spring-kafka/docs/current/reference/html/#seek
However it is not simple: The Kafka Consumer doesn't let you seek when the consumer is active nor when it does not own the partition.
The point of having consumer groups is to keep track of the offsets which have been processed so that the subsequent consumption resumes from there and also distribute load across different consumers.
If your use-case doesn't need any of the above, you can use consumer.assign() which doesn't leverage the group management functionality.
#KafkaListener(topicPartitions = #TopicPartition(topic = "so56114299",
partitions = "#{#finder.partitions('so56114299')}"))
public void listen(#Header(KafkaHeaders.RECEIVED_MESSAGE_KEY) String key, String payload) {
System.out.println(key + ":" + payload);
}
Snippet reference: https://stackoverflow.com/a/56114699/2534090
Alternatively, you can write your own KafkaConsumer and manually call the consumer.assign() for assigning the partitions.
To answer your original question, for seeking, you need to call consumer.seekToEnd() method every time your method wakes from Thread.sleep(). The subsequent poll will fetch the records from the end offset. Looks like you can add Consumer as parameter to your #KafkaListener method.
I have the following code to connect to Kafka
Properties props = new Properties();
props.put("bootstrap.servers", "myconfluentkafkabroker:9092");
props.put("group.id","test");
props.put("enable.auto.commit","true");
props.put("auto.commit.interval.ms","1000");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "my_CG");
props.put("group.instance.id", "my_instance_CG_id");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put("key.deserializer", Class.forName("org.apache.kafka.common.serialization.StringDeserializer"));
props.put("value.deserializer", Class.forName("org.apache.kafka.common.serialization.StringDeserializer"));
KafkaConsumer<String,String> consumer = new KafkaConsumer<String,String>(props);
consumer.subscribe(Arrays.asList("MyTopic"));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
{
log.debug("topic = %s, partition = %d, offset = %d,"
customer = %s, country = %s\n",
record.topic(), record.partition(), record.offset(),
record.key(), record.value());
int updatedCount = 1;
if (custCountryMap.countainsKey(record.value())) {
updatedCount = custCountryMap.get(record.value()) + 1;
}
custCountryMap.put(record.value(), updatedCount)
JSONObject json = new JSONObject(custCountryMap);
System.out.println(json.toString(4));
}
}
} finally {
consumer.close();
}
Code didn't throw any errors but I still don't see the consumer listed
would this be an issue?
props.put("group.instance.id", "my_instance_CG_id");
You should verify the information that you see with the built-in tools that Kafka provides like kafka-consumer-groups.sh
You'll also need to actually poll messages and commit offsets, not just subscribe before you will see anything.
Otherwise, for that specific Control Center dashboard, it may require you to add the Monitoring Interceptors into your client
I am trying to read the last 3 records from the topic "input_topic".
I am using only a single consumer.
But it is consuming record from only one partition.
When I manually assigned other partitions, an error comes "You can only check the position for partitions assigned to this consumer."
But I am using a single consumer.
I am not able to understand the issue.
Please help me out if possible.
Properties properties = new Properties();
properties.setProperty(ConsumerConfig.CLIENT_ID_CONFIG,"4");
properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
String topic = "input_topic";
TopicPartition topicPartition = new TopicPartition(topic, 0);
TopicPartition topicPartition1 = new TopicPartition(topic, 1);
TopicPartition topicPartition2 = new TopicPartition(topic, 2);
List<TopicPartition> topics = Arrays.asList(topicPartition1,topicPartition,topicPartition2);
while (true) {
Thread.sleep(5000);
consumer.assign(topics);
consumer.seekToEnd(topics);
long current = consumer.position(topicPartition);
consumer.seek(topicPartition, current-3);
ConsumerRecords<String, String> records = consumer.poll(100);
System.out.println("-------------------------------------------> "+ records.count());
System.out.println("-------------------------------------------> "+ LocalDateTime.now());
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(), record.value());
System.out.println("_________________________" + record.partition());
}
}
My guess ... other than what Hatice said about assign which should be made outside of the loop just one time, I see this from your code.
You seek the position at the end of all topic partitions but then you seek on the offset for the last 3 records just for the topic partition 0.
At that point, the poll is able to consume just only those 3 records from topic partition 0 and not from other partitions because your position on them is at end (of course, it's true if you are not sending more messages to those partitions as well).
I need to pull data from Kafka consumer to pass it on to my application. Below is the code that I have written to access the consumer:
public class ConsumerGroup {
public static void main(String[] args) throws Exception {
String topic = "kafka_topic";
String group = "0";
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", group);
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("auto.offset.reset", "earliest");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(topic));
System.out.println("Subscribed to topic: " + topic);
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s\n", record.offset(), record.key(), record.value());
}
}
}
When I run this code, sometimes the data is getting generated and sometimes no data is generated. Why this behavior is inconsistent? Is there any issue with my code?
Your code is Ok. You have autocommit option enabled, so after you read the records, they are automatically committed to Kafka. Every time when you run the code you start from the last processed offset, which is stored in __consumer_offsets topic. So you always read only the new records, which have arrived to Kafka after last run. To print the data constantly in the consumer app, you should put constantly new records into your topic.
I am a student researching and playing around with Kafka. After following the examples on the Apache documentation, I'm playing around with the examples portion in the trunk of their current Github repo.
As of right now, the example implements an 'older' version of their Consumer and does not employ the new KafkaConsumer. Following the documentation, I have written my own version of the KafkaConsumer thinking that it would be faster.
This is a vague question, but on runthrough I produce 5000 simple messages such as "Message_CurrentMessageNumber" to a topic "test" and then use my consumer to fetch these messages and print them to stdout. When I run the example code replacing the provided consumer with the newer KafkaConsumer (v 0.8.2 and up) it works pretty quickly and comparably to the example in its first runthrough, but slows down considerably anytime after that.
I notice that my Kafka Server outputs
Rebalancing group group1 generation 3 (kafka.coordinator.ConsumerCoordinator)
or similar messages often which leads me to believe that Kafka has to do some sort of load balancing that slows stuff down but I was wondering if anyone else had insight as to what I am doing wrong.
public class AlternateConsumer extends Thread {
private final KafkaConsumer<Integer, String> consumer;
private final String topic;
private final Boolean isAsync = false;
public AlternateConsumer(String topic) {
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("group.id", "newestGroup");
properties.put("partition.assignment.strategy", "roundrobin");
properties.put("enable.auto.commit", "true");
properties.put("auto.commit.interval.ms", "1000");
properties.put("session.timeout.ms", "30000");
properties.put("key.deserializer", "org.apache.kafka.common.serialization.IntegerDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
consumer = new KafkaConsumer<Integer, String>(properties);
consumer.subscribe(topic);
this.topic = topic;
}
public void run() {
while (true) {
ConsumerRecords<Integer, String> records = consumer.poll(100);
for (ConsumerRecord<Integer, String> record : records) {
System.out.println("We received message: " + record.value() + " from topic: " + record.topic());
}
}
// ConsumerRecords<Integer, String> records = consumer.poll(0);
// for (ConsumerRecord<Integer, String> record : records) {
// System.out.println("We received message: " + record.value() + " from topic: " + record.topic());
// }
// consumer.close();
}
}
To start:
package kafka.examples;
public class KafkaConsumerProducerDemo implements KafkaProperties
{
public static void main(String[] args) {
final boolean isAsync = args.length > 0 ? !args[0].trim().toLowerCase().equals("sync") : true;
Producer producerThread = new Producer("test", isAsync);
producerThread.start();
AlternateConsumer consumerThread = new AlternateConsumer("test");
consumerThread.start();
}
}
The producer is the default producer located here: https://github.com/apache/kafka/blob/trunk/examples/src/main/java/kafka/examples/Producer.java
This should not be the case. If the setup is similar between your two consumers you should expect better result with new consumer unless there is issue in the client/consumer implementation, which seems to be the case here.
Can you share your benchmark results and the frequency of reported rebalancing and/or any pattern (i.e. sluggish once at startup, after fixed message consumption, after the queue is drained, etc) you are observing. Also if you can share some details about your consumer implementation.