we are migrating to Kafka, I need to create a monitoring POC service that will periodically check the unprocessed message count in the Kafka queue and based on the count take some action. but this service must not read or process the message, designated consumers will do that, with every cron this service just needs the count of unprocessed messages present in the queue.
so far I have done this, from multiple examples
public void stats() throws ExecutionException, InterruptedException {
Map<String, Object> props = new HashMap<>();
// list of host:port pairs used for establishing the initial connections to the Kafka cluster
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
try (final KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
consumer.subscribe(Arrays.asList(topicName));
while (true) {
Thread.sleep(1000);
ConsumerRecords<String, String> records = consumer.poll(1000);
if (!records.isEmpty()) {
System.out.println("records is not empty = " + records.count() + " " + records);
}
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
Set<TopicPartition> partitions = consumer.assignment();
//consumer.seekToBeginning(partitions);
Map<TopicPartition, Long> offsets = consumer.endOffsets(partitions);
for (TopicPartition partition : offsets.keySet()) {
OffsetAndMetadata commitOffset = consumer.committed(new TopicPartition(partition.topic(), partition.partition()));
Long lag = commitOffset == null ? offsets.get(partition) : offsets.get(partition) - commitOffset.offset();
System.out.println("lag = " + lag);
System.out.printf("partition %s is at %d\n", partition.topic(), offsets.get(partition));
}
}
}
}
}
the code is working fine some times and some times gives wrong output, please let me know
Don't subscribe to the topic; just create a consumer with the same group to get the endOffsets.
See this answer for an example.
Related
I am looking for a way to consume some set of messages from my Kafka topic with specific offset range (assume my partition has offset from 200 - 300, I want to consume the messages from offset 250-270).
I am using below code where I can specify the initial offset, but it would consume all the messages from 250 to till end. Is there any way/attributes available to set the end offset to consume the messages till that point.
#KafkaListener(id = "KafkaListener",
topics = "${kafka.topic.name}",
containerFactory = "kafkaManualAckListenerContainerFactory",
errorHandler = "${kafka.error.handler}",
topicPartitions = #TopicPartition(topic = "${kafka.topic.name}",
partitionOffsets = {
#PartitionOffset(partition = "0", initialOffset = "250"),
#PartitionOffset(partition = "1", initialOffset = "250")
}))
You can use seek() in order to force the consumer to start consuming from a specific offset and then poll() until you reach the target end offset.
public void seek(TopicPartition partition, long offset)
Overrides the fetch offsets that the consumer will use on the next poll(timeout). If this API is invoked for the
same partition more than once, the latest offset will be used on the
next poll(). Note that you may lose data if this API is arbitrarily
used in the middle of consumption, to reset the fetch offsets
For example, let's assume you want to start from offset 200:
TopicPartition tp = new TopicPartition("myTopic", 0);
Long startOffset = 200L
Long endOffset = 300L
List<TopicPartition> topics = Arrays.asList(tp);
consumer.assign(topics);
consumer.seek(topicPartition, startOffset);
now you just need to keep poll()ing until endOffset is reached:
boolean run = true;
while (run) {
ConsumerRecords<String, String> records = consumer.poll(1000);
for (ConsumerRecord<String, String> record : records) {
// Do whatever you want to do with `record`
// Check if end offset has been reached
if (record.offset() == endOffset) {
run = false;
break;
}
}
}
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<String, String>(properties);
boolean keepOnReading = true;
// offset to read the data from.
long offsetToReadFrom = 250L;
// seek is mostly used to replay data or fetch a specific message
// seek
kafkaConsumer.seek(partitionToReadFrom, offsetToReadFrom);
while(keepOnReading) {
ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
numberOfMessagesRead ++;
logger.info("Key: "+record.key() + ", Value: " + record.value());
logger.info("Partition: " + record.partition() + ", Offset: " + record.offset());
if(record.offset() == 270L) {
keepOnReading = false;
break;
}
}
}
I hope this helps you !!
kafka version : 0.9.0.1
If n = 20,
I have to get last 20 messages of a topic.
I tried with
kafkaConsumer.seekToBeginning();
But it retrieves all the messages. I need to get only the last 20 messages.
This topic may have hundreds of thousands of records
public List<JSONObject> consumeMessages(String kafkaTopicName) {
KafkaConsumer<String, String> kafkaConsumer = null;
boolean flag = true;
List<JSONObject> messagesFromKafka = new ArrayList<>();
int recordCount = 0;
int i = 0;
int maxMessagesToReturn = 20;
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "project.group.id");
props.put("max.partition.fetch.bytes", "1048576000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
kafkaConsumer = new KafkaConsumer<>(props);
kafkaConsumer.subscribe(Arrays.asList(kafkaTopicName));
TopicPartition topicPartition = new TopicPartition(kafkaTopicName, 0);
LOGGER.info("Subscribed to topic " + kafkaConsumer.listTopics());
while (flag) {
// will consume all the messages and store in records
ConsumerRecords<String, String> records = kafkaConsumer.poll(1000);
kafkaConsumer.seekToBeginning(topicPartition);
// getting total records count
recordCount = records.count();
LOGGER.info("recordCount " + recordCount);
for (ConsumerRecord<String, String> record : records) {
if(record.value() != null) {
if (i >= recordCount - maxMessagesToReturn) {
// adding last 20 messages to messagesFromKafka
LOGGER.info("kafkaMessage "+record.value());
messagesFromKafka.add(new JSONObject(record.value()));
}
i++;
}
}
if (recordCount > 0) {
flag = false;
}
}
kafkaConsumer.close();
return messagesFromKafka;
}
You can use kafkaConsumer.seekToEnd(Collection<TopicPartition> partitions) to seek to the last offset of the given partition(s). As per the documentation:
"Seek to the last offset for each of the given partitions. This function evaluates lazily, seeking to the final offset in all partitions only when poll(Duration) or position(TopicPartition) are called. If no partitions are provided, seek to the final offset for all of the currently assigned partitions."
Then you can retrieve the position of a particular partition using position(TopicPartition partition).
Then you can reduce 20 from it, and use kafkaConsumer.seek(TopicPartition partition, long offset) to get to the most recent 20 messages.
Simply,
kafkaConsumer.seekToEnd(partitionList);
long endPosition = kafkaConsumer.position(topicPartiton);
long recentMessagesStartPosition = endPosition - maxMessagesToReturn;
kafkaConsumer.seek(topicPartition, recentMessagesStartPosition);
Now you can retrieve the most recent 20 messages using poll()
This is the simple logic, but if you have multiple partitions, you have to consider those cases as well. I did not try this, but hope you'll get the concept.
I'm trying to consume records from a MySQL table which contains 3 columns (Axis, Price, lastname) with their datatypes (int, decimal(14,4), varchar(50)) respectively.
I inserted one record which has the following data (1, 5.0000, John).
The following Java code (which consumes the AVRO records from a topic created by a MySQL Connector in Confluent platform) reads the decimal column: Price, as java.nio.HeapByteBuffer type so i can't reach the value of the column when i receive it.
Is there a way to extract or convert the received data to a Java decimal or double data type?
Here is the MySQL Connector properties file:-
{
"name": "mysql-source",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"incrementing.column.name": "Axis",
"tasks.max": "1",
"table.whitelist": "ticket",
"mode": "incrementing",
"topic.prefix": "mysql-",
"name": "mysql-source",
"validate.non.null": "false",
"connection.url": "jdbc:mysql://localhost:3306/ticket?
user=user&password=password"
}
}
Here is the code:-
public static void main(String[] args) throws InterruptedException,
IOException {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group1");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
"io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("schema.registry.url", "http://localhost:8081");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
String topic = "sql-ticket";
final Consumer<String, GenericRecord> consumer = new KafkaConsumer<String, GenericRecord>(props);
consumer.subscribe(Arrays.asList(topic));
try {
while (true) {
ConsumerRecords<String, GenericRecord> records = consumer.poll(100);
for (ConsumerRecord<String, GenericRecord> record : records) {
System.out.printf("value = %s \n", record.value().get("Price"));
}
}
} finally {
consumer.close();
}
}
Alright so i finally found the solution.
The Heapbytebuffer needs to be converted to a byte[] array, then i used BigInteger which constructs the value from the created byte array, then i created a BigDecimal variable that takes the value of the BigInteger and i set the decimal point with movePointLeft(4) which is the Scale (in my case : 4) and everything worked as expected.
ByteBuffer buf = (ByteBuffer) record.value().get(("Price"));
byte[] arr = new byte[buf.remaining()];
buf.get(arr);
BigInteger bi =new BigInteger(1,arr);
BigDecimal bd = new BigDecimal(bi).movePointLeft(4);
System.out.println(bd);
Here are the results ( Left is the output, Right is MySQL) :-
I am currently trying to use a KStream to KTable join to perform enrichment of a Kafka topic. For my proof of concept I currently have a Kafka Stream with about 600,000 records which all have the same key and a KTable created from a topic with 1 record of a key, value pair where the key in the KTable topic matches the key of the 600,000 records in the topic the KStream is created from.
When I use a left join (via the code below), all of the records return NULL on the ValueJoiner.
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe-json-parse-" + System.currentTimeMillis());
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "xxx.xx.xx.xxx:9092");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG, "org.apache.kafka.streams.processor.WallclockTimestampExtractor");
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 5);
final StreamsBuilder builder = new StreamsBuilder();
// Build a Kafka Stream from the Netcool Input Topic
KStream<String, String> source = builder.stream("output-100k");
// Join the KStream to the KTable
KStream<String, String> enriched_output = source
.leftJoin(netcool_enrichment, (orig_msg, description) -> {
String new_msg = jsonEnricher(orig_msg, description);
if (description != null) {
System.out.println("\n[DEBUG] Enriched Input Orig: " + orig_msg);
System.out.println("[DEBUG] Enriched Input Desc: " + description);
System.out.println("[DEBUG] Enriched Output: " + new_msg);
}
return new_msg;
});
Here is a sample output record (using a forEach loop) from the source KStream:
[KSTREAM] Key: ismlogs
[KSTREAM] Value: {"severity":"debug","ingested_timestamp":"2018-07-18T19:32:47.227Z","#timestamp":"2018-06-28T23:36:31.000Z","offset":482,"#metadata":{"beat":"filebeat","topic":"input-100k","type":"doc","version":"6.2.2"},"beat":{"hostname":"abc.dec.com","name":"abc.dec.com","version":"6.2.2"},"source":"/root/100k-raw.txt","message":"Thu Jun 28 23:36:31 2018 Debug: Checking status of file /ism/profiles/active/test.xml","key":"ismlogs","tags":["ismlogs"]}
I have tried converting the KTable back to a KStream and used a forEach loop over the converted Stream and I verify the records are actually there in the KTable.
KTable<String, String> enrichment = builder.table("enrichment");
KStream<String, String> ktable_debug = enrichment.toStream();
ktable_debug.foreach(new ForeachAction<String, String>() {
public void apply(String key, String value) {
System.out.println("[KTABLE] Key: " + key);
System.out.println("[KTABLE] Value: " + value);
}
});
The code above outputs:
[KTABLE] Key: "ismlogs"
[KTABLE] Value: "ISM Logs"
According to your console messages, the keys are different, and therefore they won't join :
[KSTREAM] Key: ismlogs
[KTABLE] Key: "ismlogs"
In the case of the KTable, the key is actually "ismlogs" with the double-quotes.
I was looking for java client (Kafka Consumer) to consume the messages from multiple brokers. please advice
Below is the code written to publish the messages to multiple brokers using simple partitioner.
Topic is created with replication factor "2" and partition "3".
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster)
{
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
logger.info("Number of Partitions " + numPartitions);
if (keyBytes == null)
{
int nextValue = counter.getAndIncrement();
List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() > 0)
{
int part = toPositive(nextValue) % availablePartitions.size();
int selectedPartition = availablePartitions.get(part).partition();
logger.info("Selected partition is " + selectedPartition);
return selectedPartition;
}
else
{
// no partitions are available, give a non-available partition
return toPositive(nextValue) % numPartitions;
}
}
else
{
// hash the keyBytes to choose a partition
return toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
}
public void publishMessage(String message , String topic)
{
Producer<String, String> producer = null;
try
{
producer = new KafkaProducer<>(producerConfigs());
logger.info("Topic to publish the message --" + this.topic);
for(int i =0 ; i < 10 ; i++)
{
producer.send(new ProducerRecord<String, String>(this.topic, message));
logger.info("Message Published Successfully");
}
}
catch(Exception e)
{
logger.error("Exception Occured " + e.getMessage()) ;
}
finally
{
producer.close();
}
}
public Map<String, Object> producerConfigs()
{
loadPropertyFile();
Map<String, Object> propsMap = new HashMap<>();
propsMap.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
propsMap.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
propsMap.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
propsMap.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, SimplePartitioner.class);
propsMap.put(ProducerConfig.ACKS_CONFIG, "1");
return propsMap;
}
public Map<String, Object> consumerConfigs() {
Map<String, Object> propsMap = new HashMap<>();
System.out.println("properties.getBootstrap()" + properties.getBootstrap());
propsMap.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, properties.getBootstrap());
propsMap.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
propsMap.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, properties.getAutocommit());
propsMap.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, properties.getTimeout());
propsMap.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.GROUP_ID_CONFIG, properties.getGroupid());
propsMap.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, properties.getAutooffset());
return propsMap;
}
#KafkaListener(id = "ID1", topics = "${config.topic}", group = "${config.groupid}")
public void listen(ConsumerRecord<?, ?> record)
{
logger.info("Message Consumed " + record);
logger.info("Partition From which Record is Received " + record.partition());
this.message = record.value().toString();
}
bootstrap.servers = [localhost:9092, localhost:9093, localhost:9094]
If you use a regular Java consumer, it will automatically read from multiple brokers. There is no special code you need to write. Just subscribe to the topic(s) you want to consumer and the consumer will connect to the corresponding brokers automatically. You only provide a "single entry point" broker -- the client figures out all other broker of the cluster automatically.
Number of Kafka broker nodes in cluster has nothing to do with consumer logic. Nodes in cluster only used for fault tolerance and bootstrap process. You placing messaging in different partitions of topic based on some custom logic it also not going to effect consumer logic. Even If you have single consumer than that consumer will consume messages from all partitions of Topic subscribed. I request you to check your code with Kafka cluster with single broker node...