Kafka consumer.poll returning empty - java

The producer code which will read a .mp4 video file from disc and sends it to kafka which apparently works since prints "Message sent to the Kafka Topic java_in_use_topic Successfully", but the consumer.poll is empty:
#RestController
#RequestMapping(value = "/javainuse-kafka/")
public class ApacheKafkaWebController {
#GetMapping(value = "/producer")
public String producer(#RequestParam("message") String message) {
Map<String, Object> props = new HashMap<>();
// list of host:port pairs used for establishing the initial connections to the Kakfa cluster
props.put(org.apache.kafka.clients.producer.ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
"localhost:9092");
props.put(org.apache.kafka.clients.producer.ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(org.apache.kafka.clients.producer.ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class.getName());
Producer<String, byte[]> producer = new KafkaProducer<>(props);
Path path = Paths.get("C:/kafka-picture-consumer/SampleVideo_1280x720_1mb.mp4");
ProducerRecord<String, byte[]> record = null;
try {
record = new ProducerRecord<>("topiccc", "keyyyyy", Files.readAllBytes(path));
} catch (IOException e) {
e.printStackTrace();
}
producer.send(record);
producer.close();
return "Message sent to the Kafka Topic java_in_use_topic Successfully";
}
The consumer code which will be used in a servlet:
public class ConsumerService {
public byte[] consumer(){
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("topiccc"));
ConsumerRecords<String, byte[]> records = consumer.poll(100);
System.out.println("ISSSSSSSSSSSSSSSSSSSSSSSSSSSSS EMPTYYYYYYYYYY:"+String.valueOf(records.isEmpty()));
return records.iterator().next().value();
}
}

There can be many possible reasons for this:
Your producer is not sending the message. If this is the case, you can check it by adding callback to your producer and printing the exception. If the exception is null, then the send() is successful.
producer.send(record, (recordMetadata, exception) -> {
System.err.println(exception);
});
Since you are sending an mp4 file, I suppose that you might not have set your Kafka broker configurations and/or topic configurations to support such a large message.
Check the max.message.bytes and message.max.bytes configurations of topic and broker. In this case, you will get RecordTooLargeException
You will have to wait till the producer completely produces the message.
You will need to set auto.offset.reset to earliest in your consumer configurations. This ensures that if no offset data is there for that topic, then it will start consuming from the first message, otherwise it will wait for the next message.
Your poll duration is short, you may need to increase this.

Related

Azure Event Hubs: Kafka Consumer polls nothing in java

I'm trying to Read messages in my consumer from azure event-hub
this is my consumer configurations:
private final KafkaConsumer<Integer, String> consumer;
private final String topic;
public SampleConsumer(String topic) throws IOException {
Properties props = new Properties();
props.load(new FileReader("src/main/resources/consumer.config"));
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
"MyNamespace.servicebus.windows.net:9093");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "GROUP_ID");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "30000");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.IntegerDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringDeserializer");
consumer = new KafkaConsumer<>(props);
this.topic = topic;
}
and this is my consumer.config file:
bootstrap.servers=<MyNamespace>.servicebus.windows.net:9093
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required
username="$**********" password="*******";
and finally in this method I try to read the records:
#Override
public void doWork() {
consumer.subscribe(Collections.singletonList(this.topic));
ConsumerRecords<Integer, String> records = consumer.poll(Duration.ofMillis(1000));
System.out.println(records.count());
for (ConsumerRecord<Integer, String> record : records) {
System.out.println("Received message: (" + record.value());
}
}
It connect correctly to the azure and event-hub.
It then tries to read the records but every time it returns 0 without any error messages. I searched on the internet and I used a lot of different solutions but nothing worked so far. There should be at least some records in event-hub.
Could someone please help me to figure out what the problem is?
Are you actively producing records? If not, then maybe you should consume from the beginning of the topic by setting auto.offset.reset=earliest for a new group id
The default behavior is to start at the end of the topic and poll new events after the consumer starts

how to do Kafka syncing between two consumer groups

in my requirement I have two consumer group, one Group(Main) just take the data and send to other server, if sending to other server is failed then I need to rejoin(start) other consumer group(Failed processing)
.
In this case Main Group continues to read and retry , and it will continue the same and some point when message sending become success it needs to notify other consumer group(Failed processing). Now Failed processing should start sending from where first fail to last fail.
public void StartMainstreamHandler() {
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> userStream = builder.stream("usertopic",Consumed.with(Serdes.String(), Serdes.String()));
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "main-streams-userstream");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "ALL my bootstrap servers);
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 4);
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "500");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
//consumer_timeout_ms
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 2000);
props.put("state.dir","/tmp/kafka/stat));
userStream.peek((key,value)->System.out.println("key :"+key+" value :"+value));
/* Send Data to other Server if Failed call other consumer */
KafkaStreams kafkaStreams = new KafkaStreams(builder.build(), props);
kafkaStreams.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
#Override
public void uncaughtException(Thread t, Throwable e) {
logger.error("Thread Name :" + t.getName() + " Error while processing:", e);
}
});
kafkaStreams.cleanUp();
kafkaStreams.start();
}
Other Consumer
public void StartFailstreamHandler() {
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> userStream = builder.stream("usertopic",Consumed.with(Serdes.String(), Serdes.String()));
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "failed-streams-userstream");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "ALL my bootstrap servers);
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 4);
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "500");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
//consumer_timeout_ms
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 2000);
props.put("state.dir","/tmp/kafka/stat));
Wait('till get notfication from other consumer" ){
userStream.peek((key,value)->System.out.println("key :"+key+" value :"+value));
/* start sending */
/* how to break when it is reached last offest */
}
KafkaStreams kafkaStreams = new KafkaStreams(builder.build(), props);
kafkaStreams.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
#Override
public void uncaughtException(Thread t, Throwable e) {
logger.error("Thread Name :" + t.getName() + " Error while processing:", e);
}
});
kafkaStreams.cleanUp();
kafkaStreams.start();
}
Now How to know and sync the offset details at second consumer to exactly stop at last failed ( last failed happened at main consumer)
It's not reasonably possible to sync consumer groups between clusters with messages alone because there's not way for a consumer to seek to a particular "replicated start point".
You'd have to store extra metadata to the side such as timestamps that replication started, maybe embedding that information in the record headers (assuming your version of Kafka supports it}. Otherwise, you're more or less blindly copying data with minimal guarantees of delivery (and therefore might be better off using MirrorMaker anyway).
MirrorMaker 2 or Confluent Replicator are currently the only two options available for "syncing" of consumer groups, to make an active-active dual cluster setup

Unable to send GenericRecord data from Kafka Producer in AVRO format

Using confluent-oss-5.0.0-2.11
My Kafka Producer code is
public class AvroProducer {
public static void main(String[] args) throws ExecutionException, InterruptedException {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("ZOOKEEPER_HOST", "localhost");
//props.put("acks", "all");
props.put("retries", 0);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
props.put("schema.registry.url", "http://localhost:8081");
String topic = "confluent-new";
Schema.Parser parser = new Schema.Parser();
// I will get below schema string from SCHEMA REGISTRY
Schema schema = parser.parse("{\"type\":\"record\",\"name\":\"User\",\"fields\":[{\"name\":\"userName\",\"type\":\"string\"},{\"name\":\"uID\",\"type\":\"string\"},{\"name\":\"company\",\"type\":\"string\",\"default\":\"ABC\"},{\"name\":\"age\",\"type\":\"int\",\"default\":0},{\"name\":\"location\",\"type\":\"string\",\"default\":\"Noida\"}]}");
Producer<String, GenericRecord> producer = new KafkaProducer<String, GenericRecord>(props);
GenericRecord record = new GenericData.Record(schema);
record.put("uID", "06080000");
record.put("userName", "User data10");
record.put("company", "User data10");
record.put("age", 12);
record.put("location", "User data10");
ProducerRecord<String, GenericRecord> recordData = new ProducerRecord<String, GenericRecord>(topic, "ip", record);
producer.send(recordData);
System.out.println("Message Sent");
}
}
Seems like Producer code is ok and able to see Message Sent on the console.
Kafka Consumer code is:
public class AvroConsumer {
public static void main(String[] args) throws ExecutionException, InterruptedException {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("ZOOKEEPER_HOST", "localhost");
props.put("acks", "all");
props.put("retries", 0);
props.put("group.id", "consumer1");
props.put("auto.offset.reset", "latest");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("schema.registry.url", "http://localhost:8081");
String topic = "confluent-new";
KafkaConsumer<String, GenericRecord> consumer = new KafkaConsumer<String, GenericRecord>(props);
consumer.subscribe(Arrays.asList(topic));
while(true){
ConsumerRecords<String, GenericRecord> recs = consumer.poll(10000);
for (ConsumerRecord<String, GenericRecord> rec : recs) {
System.out.printf("{AvroUtilsConsumerUser}: Recieved [key= %s, value= %s]\n", rec.key(), rec.value());
}
}
}
}
I am unable to see message(data) on the Kafka consumer end. Also I checked the offset count/status for confluent_new topic and its not updating. Seems like Producer code is having some problem.
Any pointer would be helpful.
Meanwhile below Producer code is working and here POJO i.e. User is avro-tools generated POJO.
public class AvroProducer {
public static void main(String[] args) throws ExecutionException, InterruptedException {
Properties props = new Properties();
kafkaParams.put("auto.offset.reset", "smallest");
kafkaParams.put("ZOOKEEPER_HOST", "bihdp01");*/
props.put("bootstrap.servers", "localhost:9092");
props.put("ZOOKEEPER_HOST", "localhost");
props.put("acks", "all");
props.put("retries", 0);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
props.put("schema.registry.url", "http://localhost:8081");
String topic = "confluent-new";
Producer<String, User> producer = new KafkaProducer<String, User>(props);
User user = new User();
user.setUID("0908");
user.setUserName("User data10");
user.setCompany("HCL");
user.setAge(20);
user.setLocation("Noida");
ProducerRecord<String, User> record = new ProducerRecord<String, User>(topic, (String) user.getUID(), user);
producer.send(record).get();
System.out.println("Sent");
}
}
P.S. My requirement is to send the received JSON data from source KAFKA topic to destination KAFKA topic in AVRO format. First I am infering AVRO schema from received JSON data using AVRO4S and registering the schema to SCHEMA REGISTRY. Next is to pull data from received JSON and populate in GenericRecord instance and send this GenericRecord instance to Kafka topic using KafkaAvroSerializer. At consumer end I will use KafkaAvroDeserializer to deserialize the received AVRO data.
In the course of finding solution I tried Thread.sleep(1000) and it fixed my problem. Also I tried producer.send(record).get() and this also fixed the problem. After going through Documentation I came across below code snippet and it hints the solution.
// When you're finished producing records, you can
flush the producer to ensure it has all been `written` to Kafka and
// then close the producer to free its resources.
finally {
producer.flush();
producer.close();
}
This is the best way to fix this problem.
Please try to add get() in first Producer
producer.send(recordData).get();

java Kafka Producer- send large amount of messages

I am trying to send 300,000 messages to Kafka topic, each message with size ~1500 bytes.
The first 299,996 messages are sent is last than a minute, but then Kafka hangs out for a while and doesn’t send the rest of the messages. Very strange.
My Kafka producer configuration:
private void initKafka() {
Properties configProperties = new Properties();
configProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaLocation);
configProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
configProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
kafkaProducer = new KafkaProducer<String, String>(configProperties);
}
My code:
for (ResponseDocument responseDocument : documents.getDocuments()) {
try {
LinkedHashMap<String, Collection<? extends Object>> fields = convertToMap(responseDocument);
String jsonDoc = objectMapper.writeValueAsString(fields);
String docId = responseDocument.getFirstValueAsString(".id");
ProducerRecord<String, String> record = new ProducerRecord<String, String>(kafkaTopic, docId, jsonDoc);
kafkaProducer.send(record);
} catch (Exception e) {
LOGGER.error(ErrorCode.DATA_ACCESS_ERROR, "Failed to send document with .id {0}: {1}",responseDocument.getId(), e.getMessage());
}
}
I tried to play a little bit with the configuration, number of sent messages (changed to 250,000) and some more thinks..
Any idea?
Thanks is advance.

Failed to read one message wth java-based Kafka consumer

I try to implement a java Kafka consumer. I use Kafka server version 0.9.
It's for test purpose, so all I have to do is to read one message.
public static ConsumerRecords<String, String> readFromKafka() {
ConsumerRecords<String, String> records = null;
try {
Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers", "<KAFKA_SERVER_HOST>:9092");
kafkaProps.put("auto.commit.enable", "false");
kafkaProps.put("value.deserializer", StringDeserializer.class.getName());
kafkaProps.put("key.deserializer", StringDeserializer.class.getName());
kafkaProps.put("client.id", "testScore0");
kafkaProps.put("group.id", "testScore1");
kafkaProps.put("auto.offset.reset", "latest");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(kafkaProps);
consumer.subscribe(Arrays.asList("my_topic"));
records = consumer.poll(0);
} catch (Exception e) {
logger.error("Can not read from kafka", e);
}
return records;
}
The returned records object is empty:
I execute a command-line Kafka consumer on my local machine which connects to the same KAFKA_SERVER_HOST and do get messages.
change the poll time on
records = consumer.poll(0);
for something bigger than 0, try with 100.
records = consumer.poll(100);

Categories