Get last event in Kafka - JAVA

Get last event in Kafka - JAVA - java

I'm actually using #KafkaListener to read events in a topic. I want to read 100 events, and after this call Thread.sleep() with a certain time period.
My problem is, when the thread wakes up from sleep, the listener continues on the last event I read, but I want to discard the events when the thread is sleeping and continues with the last events in topic.
Like:
1-100 - Capture
Thread sleeping
101-500
Thread Returns
501 - 601 - Capture
The 101-500 events can be discarded
Code:
#KafkaListener(topics = "topic")
public void consumeBalance(ConsumerRecord<String, String> payload) throws InterruptedException {
this.valorMaximoDeRequest = this.valorMaximoDeRequest + 1;
if (this.valorMaximoDeRequest <= 100) {
log.info("Encontrou evento )");
log.info("Key: " + payload.key() + ", Value:" + payload.value());
log.info("Partition:" + payload.partition() + ",Offset:" + payload.offset());
JsonObject jsonObject = new Gson().fromJson(payload.value(), JsonObject.class);
String accountId = jsonObject.get("accountId").getAsString();
log.info(">>>>>>>>> accountId: " + accountId);
} else {
this.valorMaximoDeRequest = 0;
Thread.sleep(60*1000);
}
}
Kafka config:
#Bean
public Map<String, Object> kafkaFactory() {
Map<String, Object> props = new HashMap<>();
props.put("specific.avro.reader", Boolean.TRUE);
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "brokers");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "1");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put("security.protocol", "SASL_PLAINTEXT");
return props;
}

First, you shouldn't force the listening thread to sleep. The consumer may be considered as dead and trigger a consumer rebalance. You'd better use pause and resume on the consumer. See https://docs.spring.io/spring-kafka/docs/current/reference/html/#pause-resume
Then, if you want to skip the records published when the consumer was asleep, you'll have to seek (seekToBeginning) when the consumer awakes. See https://docs.spring.io/spring-kafka/docs/current/reference/html/#seek
However it is not simple: The Kafka Consumer doesn't let you seek when the consumer is active nor when it does not own the partition.

The point of having consumer groups is to keep track of the offsets which have been processed so that the subsequent consumption resumes from there and also distribute load across different consumers.
If your use-case doesn't need any of the above, you can use consumer.assign() which doesn't leverage the group management functionality.
#KafkaListener(topicPartitions = #TopicPartition(topic = "so56114299",
partitions = "#{#finder.partitions('so56114299')}"))
public void listen(#Header(KafkaHeaders.RECEIVED_MESSAGE_KEY) String key, String payload) {
System.out.println(key + ":" + payload);
}
Snippet reference: https://stackoverflow.com/a/56114699/2534090
Alternatively, you can write your own KafkaConsumer and manually call the consumer.assign() for assigning the partitions.
To answer your original question, for seeking, you need to call consumer.seekToEnd() method every time your method wakes from Thread.sleep(). The subsequent poll will fetch the records from the end offset. Looks like you can add Consumer as parameter to your #KafkaListener method.

Related

Redis Streams one message per consumer with Java

I'am trying to implement a java application with redis streams where every consomer consumes exactly one message. Like a pipeline/queue where every consumer takes exactly one message, processes it and after finishing the consumer takes the next message which was not processed so far in the stream.
What works is that every message is consumed by exactly one consumer (with xreadgroup).
I started with this tutorial from redislabs
The code:
RedisClient redisClient = RedisClient.create("redis://pw#host:port");
StatefulRedisConnection<String, String> connection = redisClient.connect();
RedisCommands<String, String> syncCommands = connection.sync();
try {
syncCommands.xgroupCreate(XReadArgs.StreamOffset.from(STREAM_KEY, "0-0"), ID_READ_GROUP);
} catch (RedisBusyException redisBusyException) {
System.out.println(String.format("\t Group '%s' already exists", ID_READ_GROUP));
}
System.out.println("Waiting for new messages ");
while (true) {
List<StreamMessage<String, String>> messages = syncCommands.xreadgroup(
Consumer.from(ID_READ_GROUP, ID_WORKER), ReadArgs.StreamOffset.lastConsumed(STREAM_KEY));
if (!messages.isEmpty()) {
System.out.println(messages.size()); //
for (StreamMessage<String, String> message : messages) {
System.out.println(message.getId());
Thread.sleep(5000);
syncCommands.xack(STREAM_KEY, ID_READ_GROUP, message.getId());
}
}
}
My current problem is that a consumer takes more that one message from the queue and in some situations the other consumers are waiting and one consumer is processing 10 messages at once.
Thanks in advance!

Notice that XREADGROUP can get COUNT argument.
See the JavaDoc how to do it in Lettuce xreadgroup, by passing XReadArgs.

IllegalStateException Subscription to topics, partitions and pattern are mutually exclusive

Need to fetch messages from a Kafka topic, from a particular offset
Stuck cause of IllegalStateException exception at assign()
If I do not use assign() , then the consumer does not perform seek, as that being a Lazy operation
Actual purpose: Need to iterate messages at topic from a pre-decided offset till end. This pre-decided offset is calculated at markOffset()
static void fetchMessagesFromMarkedOffset() {
Consumer<Long, String> consumer = ConsumerCreator.createConsumer();
consumer.assign(set); // <---- Exception at this place
map.forEach((k,v) -> {
consumer.seek(k, v-3);
});
ConsumerRecords<Long, String> consumerRecords = consumer.poll(100);
consumerRecords.forEach(record -> {
System.out.println("Record Key " + record.key());
System.out.println("Record value " + record.value());
System.out.println("Record partition " + record.partition());
System.out.println("Record offset " + record.offset());
});
consumer.close();
}
Rest of concerned code involved
public static Set<TopicPartition> set;
public static Map<TopicPartition, Long> map;
static void markOffset() {
Consumer<Long, String> consumer = ConsumerCreator.createConsumer();
consumer.poll(100);
set = consumer.assignment();
map = consumer.endOffsets(set);
System.out.println("Topic Partitions: " + set);
System.out.println("End Offsets: " + map);
}
Consumer Creation
private Consumer createConsumer(String topicName) {
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "capacity-service-application");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
final Consumer consumer = new KafkaConsumer(props);
consumer.subscribe(Collections.singletonList(topicName));
return consumer;
}
Exception
Exception in thread "main" java.lang.IllegalStateException: Subscription to topics, partitions and pattern are mutually exclusive
at org.apache.kafka.clients.consumer.internals.SubscriptionState.setSubscriptionType(SubscriptionState.java:104)
at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignFromUser(SubscriptionState.java:157)
at org.apache.kafka.clients.consumer.KafkaConsumer.assign(KafkaConsumer.java:1064)
at com.gaurav.kafka.App.fetchMessagesFromMarkedOffset(App.java:44)
at com.gaurav.kafka.App.main(App.java:30)

You can't mixed manual and automatic partition assignment.
You should use KafkaConsumer::subscribe or KafkaConsumer::assign but not both.
If after calling KafkaConsumer::subscribe you want to switch to manual approach you should first call KafkaConsumer::unsubscribe.
According to https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
Note that it isn't possible to mix manual partition assignment (i.e. using assign) with dynamic partition assignment through topic subscription (i.e. using subscribe).

How to fetch messages which are uncommited in kafka

I have a function in java in which I am trying to fetch messages which are unread. For example, If I have messages with offSet 0,1,2 in broker which are already read by the consumer and If I switch off my consumer for an hour. And at that time I produce messages with offset 3,4,5. After that when my consumer is started it should read message from offset 3 not from 0. But, It either reads all the messages or read those messages which are produced after starting Kafka consumer. I want to read those messages which are unread or uncommited
I tried "auto.offset.reset"= "latest" and "earliest". as well as "enable.auto.commit" = "true" and "false". I also tried commitSync() and commitAsync() before calling close() method but no luck.
public static KafkaConsumer createConsumer() {
Properties properties = new Properties();
properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, Constants.KAFKA_BROKER);
properties.put(ConsumerConfig.GROUP_ID_CONFIG, "testGroup");
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
properties.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "50");
properties.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1");
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(properties);
consumer.subscribe(Collections.singleton(Constants.TOPIC));
return consumer;
}
public static void main(String[] args) {
System.out.println("");
System.out.println("----------------");
System.out.println("");
System.out.println("KAFKA CONSUMER EXAMPLE");
System.out.println("");
System.out.println("----------------");
System.out.println("");
OffsetAndMetadata offsetAndMetadataInitial = createConsumer().committed(new TopicPartition(Constants.TOPIC, 0));
System.out.println("");
System.out.println("Offset And MetaData Initial : ");
System.out.println(offsetAndMetadataInitial);
System.out.println("");
ConsumerRecords<String, String> consumerRecords = createConsumer().poll(Duration.ofSeconds(2L));
System.out.println("");
System.out.println("Count Consumer Records : " + consumerRecords.count());
System.out.println("");
Iterator<ConsumerRecord<String, String>> itr = consumerRecords.iterator();
Map<TopicPartition, OffsetAndMetadata> partationOffsetMap = new HashMap<>(4);
while (itr.hasNext()) {
ConsumerRecord record = itr.next();
System.out.println("OffSet : " + record.offset());
System.out.println("");
System.out.println("Key : " + record.key());
System.out.println("Value : " + record.value());
System.out.println("Partition : " + record.partition());
System.out.println("--------------------------");
System.out.println("");
}
createConsumer().close();
}
I just want to fetch only unread messages in kafka Consumer. Please correct me if I am wrong somewhere. And Thanks in Advance

The main problem in your code is that you are not closing the consumer you used to poll messages; this is because each call to createConsumer() creates a new KafkaConsumer. And as you are not closing the consumer, and are calling poll() only once, you never commit the messages you have read.
(with auto-commit, commit is called within poll() after auto-commit-interval, and within close())
Once you will have corrected that it should work with following settings:
auto-commit=true (otherwise you could also commit manually, but auto-commit is simpler).
offset-reset= earliest (this has only effect the first time you consume for a given group-id, to tell if you want to consume from the begining of the topic or only messages produced after you started to consume. Once you have started to consume with a given group-id, you will always continue to consume from the latest offset you have committed.)
group-id must not change between restarts, or you will start from the begining or from the end again depending on your offset-reset setting.
Hope this helps

kafka producers are very slow

I am new in Kafka and I have a question that I'm not able to resolve.
I have installed Kafka and Zookeeper in my own computer in Windows (not in Linux) and I have created a broker with a topic with several partitions (playing between 6 and 12 partitions).
When I create consumers, they works perfectly and read at good speed, but referring producer, I have created the simple producer one can see in many web sites. The producer is inside a loop and is sending many short messages (about 2000 very short messages).
I can see that consumers read the 2000 messages very quicly, but producer sends message to the broker at more or less 140 or 150 messages per second. As I said before, I'm working in my own laptop (only 1 disk), but when I read about millions of messages per second, I think there is something I forgot because I'm light-years far from that.
If I use more producers, the result is worse.
Is a question of more brokers in the same node or something like that? This problem have been imposed to me in my job and I have not the possibility of a better computer.
The code for creating the producer is
public class Producer {
public void publica(String topic, String strKey, String strValue) {
Properties configProperties = new Properties();
configProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
configProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName());
configProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(configProperties);
ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, strValue);
producer.send(rec);
}
}
and the code for sending messages is (partial):
Producer prod = new Producer();
for (int i = 0; i < 2000; i++)
{
key = String.valueOf(i);
prod.publica("TopicName", key, texto + " - " + key);
// System.out.println(i + " - " + System.currentTimeMillis());
}

You may create your Kafka producer once and use it every time you need to send a message:
public class Producer {
private final KafkaProducer<String, String> producer; // initialize in constructor
public void publica(String topic, String strKey, String strValue) {
ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, strValue);
producer.send(rec);
}
}
Also take a look at the producer and broker configurations available here. There are several options with which you can tune for your application's needs.

Why does my Kafka Consumer consume messages quickly on first run, but slows down considerably in future runs?

I am a student researching and playing around with Kafka. After following the examples on the Apache documentation, I'm playing around with the examples portion in the trunk of their current Github repo.
As of right now, the example implements an 'older' version of their Consumer and does not employ the new KafkaConsumer. Following the documentation, I have written my own version of the KafkaConsumer thinking that it would be faster.
This is a vague question, but on runthrough I produce 5000 simple messages such as "Message_CurrentMessageNumber" to a topic "test" and then use my consumer to fetch these messages and print them to stdout. When I run the example code replacing the provided consumer with the newer KafkaConsumer (v 0.8.2 and up) it works pretty quickly and comparably to the example in its first runthrough, but slows down considerably anytime after that.
I notice that my Kafka Server outputs
Rebalancing group group1 generation 3 (kafka.coordinator.ConsumerCoordinator)
or similar messages often which leads me to believe that Kafka has to do some sort of load balancing that slows stuff down but I was wondering if anyone else had insight as to what I am doing wrong.
public class AlternateConsumer extends Thread {
private final KafkaConsumer<Integer, String> consumer;
private final String topic;
private final Boolean isAsync = false;
public AlternateConsumer(String topic) {
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("group.id", "newestGroup");
properties.put("partition.assignment.strategy", "roundrobin");
properties.put("enable.auto.commit", "true");
properties.put("auto.commit.interval.ms", "1000");
properties.put("session.timeout.ms", "30000");
properties.put("key.deserializer", "org.apache.kafka.common.serialization.IntegerDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
consumer = new KafkaConsumer<Integer, String>(properties);
consumer.subscribe(topic);
this.topic = topic;
}
public void run() {
while (true) {
ConsumerRecords<Integer, String> records = consumer.poll(100);
for (ConsumerRecord<Integer, String> record : records) {
System.out.println("We received message: " + record.value() + " from topic: " + record.topic());
}
}
// ConsumerRecords<Integer, String> records = consumer.poll(0);
// for (ConsumerRecord<Integer, String> record : records) {
// System.out.println("We received message: " + record.value() + " from topic: " + record.topic());
// }
// consumer.close();
}
}
To start:
package kafka.examples;
public class KafkaConsumerProducerDemo implements KafkaProperties
{
public static void main(String[] args) {
final boolean isAsync = args.length > 0 ? !args[0].trim().toLowerCase().equals("sync") : true;
Producer producerThread = new Producer("test", isAsync);
producerThread.start();
AlternateConsumer consumerThread = new AlternateConsumer("test");
consumerThread.start();
}
}
The producer is the default producer located here: https://github.com/apache/kafka/blob/trunk/examples/src/main/java/kafka/examples/Producer.java

This should not be the case. If the setup is similar between your two consumers you should expect better result with new consumer unless there is issue in the client/consumer implementation, which seems to be the case here.
Can you share your benchmark results and the frequency of reported rebalancing and/or any pattern (i.e. sluggish once at startup, after fixed message consumption, after the queue is drained, etc) you are observing. Also if you can share some details about your consumer implementation.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get last event in Kafka - JAVA - java

Related

Redis Streams one message per consumer with Java

IllegalStateException Subscription to topics, partitions and pattern are mutually exclusive

How to fetch messages which are uncommited in kafka

kafka producers are very slow

Why does my Kafka Consumer consume messages quickly on first run, but slows down considerably in future runs?

Categories

Resources