How to get Kafka Producer messages count - java

I use following code to create one producer which produces around 2000 messages.
public class ProducerDemoWithCallback {
public static void main(String[] args) {
final Logger logger = LoggerFactory.getLogger(ProducerDemoWithCallback.class);
String bootstrapServers = "localhost:9092";
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
// create the producer
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties);
for (int i=0; i<2000; i++ ) {
// create a producer record
ProducerRecord<String, String> record =
new ProducerRecord<String, String>("TwitterProducer", "Hello World " + Integer.toString(i));
// send data - asynchronous
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
// executes every time a record is successfully sent or an exception is thrown
if (e == null) {
// the record was successfully sent
logger .info("Received new metadata. \n" +
"Topic:" + recordMetadata.topic() + "\n" +
"Partition: " + recordMetadata.partition() + "\n" +
"Offset: " + recordMetadata.offset() + "\n" +
"Timestamp: " + recordMetadata.timestamp());
} else {
logger .error("Error while producing", e);
}
}
});
}
// flush data
producer.flush();
// flush and close producer
producer.close();
}
}
I want to count those messages and get int value.
I use this command and it works, but i am trying to get this count using code.
"bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic TwitterProducer --time -1"
and the result is
- TwitterProducer:0:2000
My code to do the same programmatically looks something like this, but I'm not sure if this is the correct way to get the count:
int valueCount = (int) recordMetadata.offset();
System.out.println("Offset value " + valueCount);
Can someone help me to get count of Kafka messages offset value using code.

You can have a look at implementation details of GetOffsetShell.
Here is a simplified code re-written in Java:
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.util.*;
import java.util.stream.Collectors;
public class GetOffsetCommand {
private static final Set<String> TopicNames = new HashSet<>();
static {
TopicNames.add("my-topic");
TopicNames.add("not-my-topic");
}
public static void main(String[] args) {
TopicNames.forEach(topicName -> {
final Map<TopicPartition, Long> offsets = getOffsets(topicName);
new ArrayList<>(offsets.entrySet()).forEach(System.out::println);
System.out.println(topicName + ":" + offsets.values().stream().reduce(0L, Long::sum));
});
}
private static Map<TopicPartition, Long> getOffsets(String topicName) {
final KafkaConsumer<String, String> consumer = makeKafkaConsumer();
final List<TopicPartition> partitions = listTopicPartitions(consumer, topicName);
return consumer.endOffsets(partitions);
}
private static KafkaConsumer<String, String> makeKafkaConsumer() {
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "get-offset-command");
return new KafkaConsumer<>(props);
}
private static List<TopicPartition> listTopicPartitions(KafkaConsumer<String, String> consumer, String topicName) {
return consumer.listTopics().entrySet().stream()
.filter(t -> topicName.equals(t.getKey()))
.flatMap(t -> t.getValue().stream())
.map(p -> new TopicPartition(p.topic(), p.partition()))
.collect(Collectors.toList());
}
}
which produces the offset for each topic's partition and sum (total number of messages), like:
my-topic-0=184
my-topic-2=187
my-topic-4=189
my-topic-1=196
my-topic-3=243
my-topic:999

Why do you want to get that value? If you share more detail about the purpose, I can give you more good tip.
For your last question, it's not the correct way to get the count of messages with the offset value. If your topic has one partition and the producer is one, you can use it. You need to consider that the topic has several partitions.
If you want to get the number of messages from each producer, you can count it in the callback function that is onCompletion()
Or you can get the last offset using Consumer client like this:
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "your-brokers");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
Consumer<Long, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("topic_name");
Collection<TopicPartition> partitions = consumer.assignment();
consumer.seekToEnd(partitions);
for(TopicPartition tp: partitions) {
long offsetPosition = consumer.position(tp);
}

Related

How to use a fixed number of threads to be shared between the vert.x kafka consumers?

I have a use case with multiple kafka consumers
For each kafka consumer a thread is being allocated when i check the thread dump ("vert.x-kafka-consumer-thread-0", "vert.x-kafka-consumer-thread-1", ....)
I need to know if there is a way to use a fixed number of threads to be shared between the consumers
following is the code that i tried.
public static void main(String[] args) {
Map<String, String> config = new HashMap<>();
config.put("bootstrap.servers", "localhost:9092");
config.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
config.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
config.put("group.id", "my_group_1");
config.put("auto.offset.reset", "earliest");
config.put("enable.auto.commit", "false");
Vertx vertx = Vertx.vertx();
KafkaConsumer<String, String> consumer1 = KafkaConsumer.create(vertx, config1);
KafkaConsumer<String, String> consumer2 = KafkaConsumer.create(vertx, config2);
KafkaConsumer<String, String> consumer3 = KafkaConsumer.create(vertx, config3);
KafkaConsumer<String, String> consumer4 = KafkaConsumer.create(vertx, config4);
KafkaConsumer<String, String> consumer5 = KafkaConsumer.create(vertx, config5);
KafkaConsumer<String, String> consumer6 = KafkaConsumer.create(vertx, config6);
subscribe(consumer1);
subscribe(consumer2);
subscribe(consumer3);
subscribe(consumer4);
subscribe(consumer5);
subscribe(consumer6);
}
private static void subscribe(KafkaConsumer<String, String> consumer) {
consumer.handler(record -> {
System.out.println("Processing key=" + record.key() + ",value=" + record.value() +
",partition=" + record.partition() + ",offset=" + record.offset());
});
Set<String> topics = new HashSet<>();
topics.add("test");
consumer.subscribe(topics)
.onSuccess(v -> System.out.println("subscribed"))
.onFailure(cause -> System.out.println("Could not subscribe " + cause.getMessage()));
}

Kafka consumer not receiving old messages

Kafka consumer not receiving messages produced before the consumer gets started.
public class MyKafkaConsumer {
private final KafkaConsumer<String, String> consumer;
private final String TOPIC="javaapp";
private final String BOOTSTRAP_SERVERS="localhost:9092";
private int receivedCounter=0;
private ExecutorService executorService=Executors.newFixedThreadPool(1);
private BlockingQueue<ConsumerRecords<String, String>> queue=new LinkedBlockingQueue<>(500000);
private MyKafkaConsumer() {
final Properties props=new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ConsumerConfig.GROUP_ID_CONFIG, "KafkaGroup6");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumer=new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(TOPIC));
}
public static void main(String[] args) throws InterruptedException {
MyKafkaConsumer perfKafkaConsumer=new MyKafkaConsumer();
perfKafkaConsumer.consumeMessage();
perfKafkaConsumer.runConsumer();
}
private void runConsumer() throws InterruptedException {
consumer.poll(Duration.ofMillis(1000));
while (true) {
final ConsumerRecords<String, String> consumerRecords=consumer.poll(Duration.ofMillis(10000));
if (!consumerRecords.isEmpty()) {
System.out.println("Adding result in queue " + queue.size());
queue.put(consumerRecords);
}
consumer.commitAsync();
}
}
private void consumeMessage() {
System.out.println("Consumer starts at " + Instant.now());
executorService.submit(() -> {
while (true) {
ConsumerRecords<String, String> poll=queue.take();
poll.forEach(record -> {
System.out.println("Received " + ++receivedCounter + " time " + Instant.now(Clock.systemUTC()));
});
}
});
}
}
ConsumerRecords are always empty
I checked the offset using Kafka tool
I have also tried with a different group name, it's not working. Same issue i.e. poll returns empty records
Although, if I start my consumer before than producer than its receiving the messages. (Kafka-client ver 2.4.1)
The auto.offset.reset consumer setting controls where a new consumer group will begin consuming from a topic. By default it is set to 'latest' which will set the consumer groups offset to the latest offset. You want to set this to 'earliest' if all consumer groups should start at the earliest offset in the topic.

Spring Kafka polling using Consumer

I am using consumer.poll(Duration d) for fetching the records. I have only 10 records for testing purpose in Kafka topic spread across 6 partitions. I have disabled auto commit and not committing manually either (again for testing purpose only). When poll is executed it is not fetching data from all partitions. I need to run the poll in a loop to get all the data. I haven't overridden the parameters like max.poll.size or max.fetch.bytes from its default values. What could be the reason? Please note that I have only this consumer for the given topic and group id hence I hope all the partitions will be assigned to this
private Consumer<String, Object> createConsumer() {
ConsumerFactory<String, Object> consumerFactory = deadLetterConsumerFactory();
Consumer<String, Object> consumer = consumerFactory.createConsumer();
consumer.subscribe(Collections.singletonList(kafkaConfigProperties.getDeadLetterTopic()));
return consumer;
}
try {
consumer = createConsumer();
ConsumerRecords<String, Object> records = consumer.poll(Duration.ofMillis(5000));
processMessages (records , .,....);
} catch (Exception e) {
....
} finally {
if (consumer != null) {
consumer.unsubscribe();
consumer.close();
}
}
EDIT
Here is the details
ConsumerFactory<String, Object> deadLetterConsumerFactory() {
Properties properties = new Properties();
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, server);
properties.put(SCHEMA_REGISTRY_URL, url);
properties.put(ProducerConfig.CLIENT_ID_CONFIG,
"myid" + "-" + CONSUMER_CLIENT_ID_SEQUENCE.getAndIncrement());
properties.put(SSL_ENDPOINT_IDFN_ALGM, alg);
properties.put(SaslConfigs.SASL_MECHANISM, saslmech);
properties.put(REQUEST_TIMEOUT, timeout);
properties.put(SaslConfigs.SASL_JAAS_CONFIG, config);
properties.put(SECURITY_PROTOCOL, protocol);
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
consumerProperties.put(ConsumerConfig.GROUP_ID_CONFIG, "groupid");
consumerProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
consumerProperties.forEach((key, value) -> {
map.put((String) key, value);
});
return new DefaultKafkaConsumerFactory<>(map);
}

Accessing Kafka Topic with two process

I have a Kafka producer class which works fine. The producer fills the Kafka topic. Its code is in following:
public class kafka_test {
private final static String TOPIC = "flinkTopic";
private final static String BOOTSTRAP_SERVERS = "10.32.0.2:9092,10.32.0.3:9092,10.32.0.4:9092";
public FlinkKafkaConsumer<String> createStringConsumerForTopic(
String topic, String kafkaAddress, String kafkaGroup) {
// ************************** KAFKA Properties ******
Properties props = new Properties();
props.setProperty("bootstrap.servers", kafkaAddress);
props.setProperty("group.id", kafkaGroup);
FlinkKafkaConsumer<String> myconsumer = new FlinkKafkaConsumer<>(
topic, new SimpleStringSchema(), props);
myconsumer.setStartFromLatest();
return myconsumer;
}
private static Producer<Long, String> createProducer() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "MyKafkaProducer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
return new KafkaProducer<>(props);
}
public void runProducer(String msg) throws Exception {
final Producer<Long, String> producer = createProducer();
try {
final ProducerRecord<Long, String> record = new ProducerRecord<>(TOPIC, msg );
RecordMetadata metadata = producer.send(record).get();
System.out.printf("sent record(key=%s value='%s')" + " metadata(partition=%d, offset=%d)\n",
record.key(), record.value(), metadata.partition(), metadata.offset());
} finally {
producer.flush();
producer.close();
}
}
}
public class producerTest {
public static void main(String[] args) throws Exception{
kafka_test objKafka=new kafka_test();
String pathFile="/home/cfms11/IdeaProjects/pooyaflink2/KafkaTest/quickstart/lastDay4.csv";
String delimiter="\n";
objKafka.createStringProducer("flinkTopic",
"10.32.0.2:9092,10.32.0.3:9092,10.32.0.4:9092");
Scanner scanner = new Scanner(new File(pathFile));
scanner.useDelimiter(delimiter);
int i=0;
while(scanner.hasNext()){
if (i==0)
TimeUnit.MINUTES.sleep(1);
objKafka.runProducer(scanner.next());
i++;
}
scanner.close();
}
}
Because, I want to provide data for my Flink program, so, I use Kafka. In fact, I have this part code to consume data from Kafka topic:
Properties props = new Properties();
props.setProperty("bootstrap.servers",
"10.32.0.2:9092,10.32.0.3:9092,10.32.0.4:9092");
props.setProperty("group.id", kafkaGroup);
FlinkKafkaConsumer<String> myconsumer = new FlinkKafkaConsumer<>(
"flinkTopic", new SimpleStringSchema(), props);
DataStream<String> text = env.addSource(myconsumer).setStartFromEarliest());
I want to run Producer code at the same time that my program is running. My goal is that Producer send one record to the topic and consumer can poll that record from topic at the same time.
Would you please tell me how it is possible and how to manage it.
I think you need create two class file, one is the producer, the other is the consumer. Create topic first and then run the consumer, or run the producer directly.

Get all kafka messages in a queue and stop streaming in java

I need to execute a Job at night that is going to get all the messages in a kafka queue and execute a process with them. I'm able to get the messages but the kafka stream is waiting for more messages and I'm not able to continue with my process. I have the following code:
...
private ConsumerConnector consumerConnector;
private final static String TOPIC = "test";
public MessageStreamConsumer() {
Properties properties = new Properties();
properties.put("zookeeper.connect", "localhost:2181");
properties.put("group.id", "test-group");
ConsumerConfig consumerConfig = new ConsumerConfig(properties);
consumerConnector = Consumer.createJavaConsumerConnector(consumerConfig);
}
public List<String> getMessages() {
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(TOPIC, new Integer(1));
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumerConnector
.createMessageStreams(topicCountMap);
KafkaStream<byte[], byte[]> stream = consumerMap.get(TOPIC).get(0);
ConsumerIterator<byte[], byte[]> it = stream.iterator();
List<String> messages = new ArrayList<>();
while (it.hasNext())
messages.add(new String(it.next().message()));
return messages;
}
The code is able to get the messages but when it process the last message it stays in the line:
while (it.hasNext())
The question is, how can i get all the messages from the kafka, stop the stream and continue with my other tasks.
I hope you can help me
Thanks
It seems that kafka stream does not support to consume from beginning.
You could create a native kafka consumer and set auto.offset.reset to earliest, then it will consume message from beginning.
Something like this may work. Basically the idea is to use a Kafka Consumer and poll until you get some record and then stop when you get an empty batch.
package kafka.examples;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Collections;
import java.util.Date;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicBoolean;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
public class Consumer1 extends Thread
{
private final KafkaConsumer<Integer, String> consumer;
private final String topic;
private final DateFormat df;
private final String logTag;
private boolean noMoreData = false;
private boolean gotData = false;
private int messagesReceived = 0;
AtomicBoolean isRunning = new AtomicBoolean(true);
CountDownLatch shutdownLatch = new CountDownLatch(1);
public Consumer1(Properties props)
{
logTag = "Consumer1";
consumer = new KafkaConsumer<>(props);
this.topic = props.getProperty("topic");
this.df = new SimpleDateFormat("HH:mm:ss");
consumer.subscribe(Collections.singletonList(this.topic));
}
public void getMessages() {
System.out.println("Getting messages...");
while (noMoreData == false) {
//System.out.println(logTag + ": Doing work...");
ConsumerRecords<Integer, String> records = consumer.poll(1000);
Date now = Calendar.getInstance().getTime();
int recordsCount = records.count();
messagesReceived += recordsCount;
System.out.println("recordsCount: " + recordsCount);
if (recordsCount > 0) {
gotData = true;
}
if (gotData && recordsCount == 0) {
noMoreData = true;
}
for (ConsumerRecord<Integer, String> record : records) {
int kafkaKey = record.key();
String kafkaValue = record.value();
System.out.println(this.df.format(now) + " " + logTag + ":" +
" Received: {" + kafkaKey + ":" + kafkaValue + "}" +
", partition(" + record.partition() + ")" +
", offset(" + record.offset() + ")");
}
}
System.out.println("Received " + messagesReceived + " messages");
}
public void processMessages() {
System.out.println("Processing messages...");
}
public void run() {
getMessages();
processMessages();
}
}
I'm currently developing with Kafka 0.10.0.1 and found mixed information regarding the use of consumer property auto.offset.reset so I've done some experiments to figure out what actually happens.
Based on those, I now understand it this way: when you set property:
auto.offset.reset=earliest
this positions the consumer to EITHER the first available message in the partitions assigned (when no commits have been made on the paritions) OR it positions the consumer at the last committed partition offsets (notice that you always commit last read offset + 1 or else you'll be re-reading the last committed message on each restart of your consumer)
Alternatively you do not set auto.offset.reset which means default value of 'latest' will be used.
In that case you do not receive any old messages on connecting the consumer - only messages published to the topic after connecting the consumer will be received.
As a conclusion - if you want to ensure to receive all available messages for a certain topic and assigned partitions you'll have to call seekToBeginning().
It seems advised to call poll(0L) first to ensure your consumer gets partitions assigned (or implement your code in the ConsumerRebalanceListener!), then seek each of the assigned partitions to 'beginning':
kafkaConsumer.poll(0L);
kafkaConsumer.seekToBeginning(kafkaConsumer.assignment());

Categories