Retrieve last n messages of Kafka consumer from a particular topic - java

kafka version : 0.9.0.1
If n = 20,
I have to get last 20 messages of a topic.
I tried with
kafkaConsumer.seekToBeginning();
But it retrieves all the messages. I need to get only the last 20 messages.
This topic may have hundreds of thousands of records
public List<JSONObject> consumeMessages(String kafkaTopicName) {
KafkaConsumer<String, String> kafkaConsumer = null;
boolean flag = true;
List<JSONObject> messagesFromKafka = new ArrayList<>();
int recordCount = 0;
int i = 0;
int maxMessagesToReturn = 20;
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "project.group.id");
props.put("max.partition.fetch.bytes", "1048576000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
kafkaConsumer = new KafkaConsumer<>(props);
kafkaConsumer.subscribe(Arrays.asList(kafkaTopicName));
TopicPartition topicPartition = new TopicPartition(kafkaTopicName, 0);
LOGGER.info("Subscribed to topic " + kafkaConsumer.listTopics());
while (flag) {
// will consume all the messages and store in records
ConsumerRecords<String, String> records = kafkaConsumer.poll(1000);
kafkaConsumer.seekToBeginning(topicPartition);
// getting total records count
recordCount = records.count();
LOGGER.info("recordCount " + recordCount);
for (ConsumerRecord<String, String> record : records) {
if(record.value() != null) {
if (i >= recordCount - maxMessagesToReturn) {
// adding last 20 messages to messagesFromKafka
LOGGER.info("kafkaMessage "+record.value());
messagesFromKafka.add(new JSONObject(record.value()));
}
i++;
}
}
if (recordCount > 0) {
flag = false;
}
}
kafkaConsumer.close();
return messagesFromKafka;
}

You can use kafkaConsumer.seekToEnd(Collection<TopicPartition> partitions) to seek to the last offset of the given partition(s). As per the documentation:
"Seek to the last offset for each of the given partitions. This function evaluates lazily, seeking to the final offset in all partitions only when poll(Duration) or position(TopicPartition) are called. If no partitions are provided, seek to the final offset for all of the currently assigned partitions."
Then you can retrieve the position of a particular partition using position(TopicPartition partition).
Then you can reduce 20 from it, and use kafkaConsumer.seek(TopicPartition partition, long offset) to get to the most recent 20 messages.
Simply,
kafkaConsumer.seekToEnd(partitionList);
long endPosition = kafkaConsumer.position(topicPartiton);
long recentMessagesStartPosition = endPosition - maxMessagesToReturn;
kafkaConsumer.seek(topicPartition, recentMessagesStartPosition);
Now you can retrieve the most recent 20 messages using poll()
This is the simple logic, but if you have multiple partitions, you have to consider those cases as well. I did not try this, but hope you'll get the concept.

Related

get the unprocessed message count in spring kafka

we are migrating to Kafka, I need to create a monitoring POC service that will periodically check the unprocessed message count in the Kafka queue and based on the count take some action. but this service must not read or process the message, designated consumers will do that, with every cron this service just needs the count of unprocessed messages present in the queue.
so far I have done this, from multiple examples
public void stats() throws ExecutionException, InterruptedException {
Map<String, Object> props = new HashMap<>();
// list of host:port pairs used for establishing the initial connections to the Kafka cluster
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
try (final KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
consumer.subscribe(Arrays.asList(topicName));
while (true) {
Thread.sleep(1000);
ConsumerRecords<String, String> records = consumer.poll(1000);
if (!records.isEmpty()) {
System.out.println("records is not empty = " + records.count() + " " + records);
}
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
Set<TopicPartition> partitions = consumer.assignment();
//consumer.seekToBeginning(partitions);
Map<TopicPartition, Long> offsets = consumer.endOffsets(partitions);
for (TopicPartition partition : offsets.keySet()) {
OffsetAndMetadata commitOffset = consumer.committed(new TopicPartition(partition.topic(), partition.partition()));
Long lag = commitOffset == null ? offsets.get(partition) : offsets.get(partition) - commitOffset.offset();
System.out.println("lag = " + lag);
System.out.printf("partition %s is at %d\n", partition.topic(), offsets.get(partition));
}
}
}
}
}
the code is working fine some times and some times gives wrong output, please let me know
Don't subscribe to the topic; just create a consumer with the same group to get the endOffsets.
See this answer for an example.

Read messages from Kafka topic between a range of offsets

I am looking for a way to consume some set of messages from my Kafka topic with specific offset range (assume my partition has offset from 200 - 300, I want to consume the messages from offset 250-270).
I am using below code where I can specify the initial offset, but it would consume all the messages from 250 to till end. Is there any way/attributes available to set the end offset to consume the messages till that point.
#KafkaListener(id = "KafkaListener",
topics = "${kafka.topic.name}",
containerFactory = "kafkaManualAckListenerContainerFactory",
errorHandler = "${kafka.error.handler}",
topicPartitions = #TopicPartition(topic = "${kafka.topic.name}",
partitionOffsets = {
#PartitionOffset(partition = "0", initialOffset = "250"),
#PartitionOffset(partition = "1", initialOffset = "250")
}))
You can use seek() in order to force the consumer to start consuming from a specific offset and then poll() until you reach the target end offset.
public void seek(TopicPartition partition, long offset)
Overrides the fetch offsets that the consumer will use on the next poll(timeout). If this API is invoked for the
same partition more than once, the latest offset will be used on the
next poll(). Note that you may lose data if this API is arbitrarily
used in the middle of consumption, to reset the fetch offsets
For example, let's assume you want to start from offset 200:
TopicPartition tp = new TopicPartition("myTopic", 0);
Long startOffset = 200L
Long endOffset = 300L
List<TopicPartition> topics = Arrays.asList(tp);
consumer.assign(topics);
consumer.seek(topicPartition, startOffset);
now you just need to keep poll()ing until endOffset is reached:
boolean run = true;
while (run) {
ConsumerRecords<String, String> records = consumer.poll(1000);
for (ConsumerRecord<String, String> record : records) {
// Do whatever you want to do with `record`
// Check if end offset has been reached
if (record.offset() == endOffset) {
run = false;
break;
}
}
}
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<String, String>(properties);
boolean keepOnReading = true;
// offset to read the data from.
long offsetToReadFrom = 250L;
// seek is mostly used to replay data or fetch a specific message
// seek
kafkaConsumer.seek(partitionToReadFrom, offsetToReadFrom);
while(keepOnReading) {
ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
numberOfMessagesRead ++;
logger.info("Key: "+record.key() + ", Value: " + record.value());
logger.info("Partition: " + record.partition() + ", Offset: " + record.offset());
if(record.offset() == 270L) {
keepOnReading = false;
break;
}
}
}
I hope this helps you !!

Kafka AVRO Consumer: MySQL Decimal to Java Decimal

I'm trying to consume records from a MySQL table which contains 3 columns (Axis, Price, lastname) with their datatypes (int, decimal(14,4), varchar(50)) respectively.
I inserted one record which has the following data (1, 5.0000, John).
The following Java code (which consumes the AVRO records from a topic created by a MySQL Connector in Confluent platform) reads the decimal column: Price, as java.nio.HeapByteBuffer type so i can't reach the value of the column when i receive it.
Is there a way to extract or convert the received data to a Java decimal or double data type?
Here is the MySQL Connector properties file:-
{
"name": "mysql-source",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"incrementing.column.name": "Axis",
"tasks.max": "1",
"table.whitelist": "ticket",
"mode": "incrementing",
"topic.prefix": "mysql-",
"name": "mysql-source",
"validate.non.null": "false",
"connection.url": "jdbc:mysql://localhost:3306/ticket?
user=user&password=password"
}
}
Here is the code:-
public static void main(String[] args) throws InterruptedException,
IOException {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group1");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
"io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("schema.registry.url", "http://localhost:8081");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
String topic = "sql-ticket";
final Consumer<String, GenericRecord> consumer = new KafkaConsumer<String, GenericRecord>(props);
consumer.subscribe(Arrays.asList(topic));
try {
while (true) {
ConsumerRecords<String, GenericRecord> records = consumer.poll(100);
for (ConsumerRecord<String, GenericRecord> record : records) {
System.out.printf("value = %s \n", record.value().get("Price"));
}
}
} finally {
consumer.close();
}
}
Alright so i finally found the solution.
The Heapbytebuffer needs to be converted to a byte[] array, then i used BigInteger which constructs the value from the created byte array, then i created a BigDecimal variable that takes the value of the BigInteger and i set the decimal point with movePointLeft(4) which is the Scale (in my case : 4) and everything worked as expected.
ByteBuffer buf = (ByteBuffer) record.value().get(("Price"));
byte[] arr = new byte[buf.remaining()];
buf.get(arr);
BigInteger bi =new BigInteger(1,arr);
BigDecimal bd = new BigDecimal(bi).movePointLeft(4);
System.out.println(bd);
Here are the results ( Left is the output, Right is MySQL) :-

oracle to mongodb data migration using kafka

i am trying to migrate data from oracle to mongodb using kafka. I took a sample record set of 10 million with column length of 90 each row is of 5Kb
i am dividing the data into 10 threads but one of the thread is not running every time.... when i check the data i see 1 million records are missing in mongodb.
main class:
int totalRec = countNoOfRecordsToBeProcessed;
int minRownum =0;
int maxRownum =0;
int recInThread=totalRec/10;
System.out.println("oracle "+new Date());
for(int i=minRownum;i<=totalRec;i=i+recInThread+1){
KafkaThread kth = new KafkaThread(i, i+recInThread, conn);
Thread th = new Thread(kth);
th.start();
}
System.out.println("oracle done+ "+new Date());
kafka producer thread class:
JSONObject obj = new JSONObject();
while(rs.next()){
int total_rows = rs.getMetaData().getColumnCount();
for (int i = 0; i < total_rows; i++) {
obj.put(rs.getMetaData().getColumnLabel(i + 1)
.toLowerCase(), rs.getObject(i + 1));
}
//System.out.println("object->"+serializedObject);
producer.send(new ProducerRecord<String, String>("oracle_1",obj.toString()));
obj= new JSONObject();
//System.out.println(counter++);
}
consumer class:
KafkaConsumer consumer = new KafkaConsumer<>(props);
//subscribe to topic
consumer.subscribe(Arrays.asList(topicName));
MongoClientURI clientURI = new MongoClientURI(mongoURI);
MongoClient mongoClient = new MongoClient(clientURI);
MongoDatabase database = mongoClient.getDatabase(clientURI.getDatabase());
final MongoCollection<Document> collection = database.getCollection(clientURI.getCollection());
while (true) {
final ConsumerRecords<Long, String> consumerRecords =
consumer.poll(10000);
if (consumerRecords.count()!=0) {
List<InsertOneModel> list1 = new ArrayList<>();
consumerRecords.forEach(record -> {
// System.out.printf("Consumer Record:(%d, %s, %d, %d)\n",
// record.key(), record.value(),
// record.partition(), record.offset());'
String row =null;
row = record.value();
Document doc=Document.parse(row);
InsertOneModel t = new InsertOneModel<>(doc);
list1.add(t);
});
collection.bulkWrite((List<? extends WriteModel<? extends Document>>) (list1), new BulkWriteOptions().ordered(false));
consumer.commitAsync();
list1.clear();
}
}
}
My advice: use Kafka Connect JDBC connector to pull the data in, and a Kafka Connect MongoDB sink to push the data out. Otherwise you are just reinventing the wheel. Kafka Connect is part of Apache Kafka.
Getting started with Kafka Connect:
https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/
https://www.confluent.io/blog/blogthe-simplest-useful-kafka-connect-data-pipeline-in-the-world-or-thereabouts-part-2/
https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-3/

How to write Kafka Consumer Client in java to consume the messages from multiple brokers?

I was looking for java client (Kafka Consumer) to consume the messages from multiple brokers. please advice
Below is the code written to publish the messages to multiple brokers using simple partitioner.
Topic is created with replication factor "2" and partition "3".
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster)
{
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
logger.info("Number of Partitions " + numPartitions);
if (keyBytes == null)
{
int nextValue = counter.getAndIncrement();
List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() > 0)
{
int part = toPositive(nextValue) % availablePartitions.size();
int selectedPartition = availablePartitions.get(part).partition();
logger.info("Selected partition is " + selectedPartition);
return selectedPartition;
}
else
{
// no partitions are available, give a non-available partition
return toPositive(nextValue) % numPartitions;
}
}
else
{
// hash the keyBytes to choose a partition
return toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
}
public void publishMessage(String message , String topic)
{
Producer<String, String> producer = null;
try
{
producer = new KafkaProducer<>(producerConfigs());
logger.info("Topic to publish the message --" + this.topic);
for(int i =0 ; i < 10 ; i++)
{
producer.send(new ProducerRecord<String, String>(this.topic, message));
logger.info("Message Published Successfully");
}
}
catch(Exception e)
{
logger.error("Exception Occured " + e.getMessage()) ;
}
finally
{
producer.close();
}
}
public Map<String, Object> producerConfigs()
{
loadPropertyFile();
Map<String, Object> propsMap = new HashMap<>();
propsMap.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
propsMap.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
propsMap.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
propsMap.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, SimplePartitioner.class);
propsMap.put(ProducerConfig.ACKS_CONFIG, "1");
return propsMap;
}
public Map<String, Object> consumerConfigs() {
Map<String, Object> propsMap = new HashMap<>();
System.out.println("properties.getBootstrap()" + properties.getBootstrap());
propsMap.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, properties.getBootstrap());
propsMap.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
propsMap.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, properties.getAutocommit());
propsMap.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, properties.getTimeout());
propsMap.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.GROUP_ID_CONFIG, properties.getGroupid());
propsMap.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, properties.getAutooffset());
return propsMap;
}
#KafkaListener(id = "ID1", topics = "${config.topic}", group = "${config.groupid}")
public void listen(ConsumerRecord<?, ?> record)
{
logger.info("Message Consumed " + record);
logger.info("Partition From which Record is Received " + record.partition());
this.message = record.value().toString();
}
bootstrap.servers = [localhost:9092, localhost:9093, localhost:9094]
If you use a regular Java consumer, it will automatically read from multiple brokers. There is no special code you need to write. Just subscribe to the topic(s) you want to consumer and the consumer will connect to the corresponding brokers automatically. You only provide a "single entry point" broker -- the client figures out all other broker of the cluster automatically.
Number of Kafka broker nodes in cluster has nothing to do with consumer logic. Nodes in cluster only used for fault tolerance and bootstrap process. You placing messaging in different partitions of topic based on some custom logic it also not going to effect consumer logic. Even If you have single consumer than that consumer will consume messages from all partitions of Topic subscribed. I request you to check your code with Kafka cluster with single broker node...

Categories