I am trying to send 300,000 messages to Kafka topic, each message with size ~1500 bytes.
The first 299,996 messages are sent is last than a minute, but then Kafka hangs out for a while and doesn’t send the rest of the messages. Very strange.
My Kafka producer configuration:
private void initKafka() {
Properties configProperties = new Properties();
configProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaLocation);
configProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
configProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
kafkaProducer = new KafkaProducer<String, String>(configProperties);
}
My code:
for (ResponseDocument responseDocument : documents.getDocuments()) {
try {
LinkedHashMap<String, Collection<? extends Object>> fields = convertToMap(responseDocument);
String jsonDoc = objectMapper.writeValueAsString(fields);
String docId = responseDocument.getFirstValueAsString(".id");
ProducerRecord<String, String> record = new ProducerRecord<String, String>(kafkaTopic, docId, jsonDoc);
kafkaProducer.send(record);
} catch (Exception e) {
LOGGER.error(ErrorCode.DATA_ACCESS_ERROR, "Failed to send document with .id {0}: {1}",responseDocument.getId(), e.getMessage());
}
}
I tried to play a little bit with the configuration, number of sent messages (changed to 250,000) and some more thinks..
Any idea?
Thanks is advance.
Related
I am using Kafka Transactional producer to post atomically to 2 topics on a broker. My code looks similar to this:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("transactional.id", "my-transactional-id");
Producer<String, String> producer = new KafkaProducer<>(props, new StringSerializer(), new StringSerializer());
ProducerRecord<String, String> record1 = new ProducerRecord("topic-1", null, (Object) null, payload, headerList);
ProducerRecord<String, String> record2 = new ProducerRecord("topic-2", null, (Object) null, payload, headerList);
List<ProducerRecord<String, String>> recordList = Arrays.asList(record1, record2);
producer.initTransactions();
try {
producer.beginTransaction();
Iterator var2 = recordList.iterator();
while(var2.hasNext()) {
ProducerRecord<K, V> record = (ProducerRecord)var2.next();
this.send(record, (Callback)null);
}
producer.commitTransaction();
} catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
// We can't recover from these exceptions, so our only option is to close the producer and exit.
producer.close();
} catch (KafkaException e) {
// For all other exceptions, just abort the transaction and try again.
producer.abortTransaction();
}
producer.close();
Now, in order to test the atomicity while posting to both the topics, I deleted "topic-2". I am expecting the transaction to fail completely. But strangely after several retries it commits transaction successfully to "topic-1".
Also, I am seeing continuous error logs with messages:
Error while fetching metadata with correlation id 123 :
{topic-2=UNKNOWN_TOPIC_OR_PARTITION}
But eventually it says
Transition from state IN_TRANSACTION to COMMITTING_TRANSACTION
and then posts successfully to "topic-1".
I am not sure why am I seeing this behaviour. What would possibly go wrong and is this behaviour expected?
We have one producer-consumer environment, we are using Spring Boot for our project.
Kafka configuration was done by using class
#Configuration
#EnableKafka
public class DefaultKafkaConsumerConfig {
#Value("${spring.kafka.bootstrap-servers}")
private String bootstrapServers;
#Value("${spring.kafka.bootstrap-servers-group}")
private String bootstrapServersGroup;
#Bean
public ConsumerFactory<String,String> consumerDefaultFactory(){
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, IntegerDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.GROUP_ID_CONFIG, bootstrapServersGroup);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true);
return new DefaultKafkaConsumerFactory<>(props);
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerDefaultContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerDefaultFactory());
return factory;
}
}
SCENARIO : We are writing some values on Kafka topics. Consider we have some topic where we are putting live data. Which have status like "live:0" for completed event and "live:1" for live event. Now when event going to be live it will get update and write on topic, and depending on this topic we are processing event.
ISSUE : When event get live I read data from topic with "live:1" and processed. But when event got updated and new data updated in topic.
Here now when new data updated on topic I am able to read those data. But with new data on topic, I am receiving old data too. Because I am getting both old and new data same time my event got affected. Some time it goes live some time in completed.
Anyone give any suggestions here on this?
Why I am getting committed data and newly updated data?
Any thing I am missing here in configuration?
you may want to check the couple of things:
-1. number of partitions
2. number of consumer
does it also means that you are re-writing the consume message to topic again, with new status?
try {
ListenableFuture<SendResult<String, String>> futureResult = this.kafkaTemplate.send(topicName, message);
futureResult.addCallback(new ListenableFutureCallback<SendResult<String, String>>() {
#Override
public void onSuccess(SendResult<String, String> result) {
log.info("Message successfully sent to topic {} with offset {} ", result.getRecordMetadata().topic(), result.getRecordMetadata().offset());
}
#Override
public void onFailure(Throwable ex) {
FAILMESSAGELOGGER.info("{},{}", topicName, message);
log.info("Unable to send Message to topic {} due to ", topicName, ex);
}
});
} catch (Exception e) {
log.error("Outer Exception occured while sending message {} to topic {}", new Object[] { message, topicName, e });
FAILMESSAGELOGGER.info("{},{}", topicName, message);
}
This what we have.
Using confluent-oss-5.0.0-2.11
My Kafka Producer code is
public class AvroProducer {
public static void main(String[] args) throws ExecutionException, InterruptedException {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("ZOOKEEPER_HOST", "localhost");
//props.put("acks", "all");
props.put("retries", 0);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
props.put("schema.registry.url", "http://localhost:8081");
String topic = "confluent-new";
Schema.Parser parser = new Schema.Parser();
// I will get below schema string from SCHEMA REGISTRY
Schema schema = parser.parse("{\"type\":\"record\",\"name\":\"User\",\"fields\":[{\"name\":\"userName\",\"type\":\"string\"},{\"name\":\"uID\",\"type\":\"string\"},{\"name\":\"company\",\"type\":\"string\",\"default\":\"ABC\"},{\"name\":\"age\",\"type\":\"int\",\"default\":0},{\"name\":\"location\",\"type\":\"string\",\"default\":\"Noida\"}]}");
Producer<String, GenericRecord> producer = new KafkaProducer<String, GenericRecord>(props);
GenericRecord record = new GenericData.Record(schema);
record.put("uID", "06080000");
record.put("userName", "User data10");
record.put("company", "User data10");
record.put("age", 12);
record.put("location", "User data10");
ProducerRecord<String, GenericRecord> recordData = new ProducerRecord<String, GenericRecord>(topic, "ip", record);
producer.send(recordData);
System.out.println("Message Sent");
}
}
Seems like Producer code is ok and able to see Message Sent on the console.
Kafka Consumer code is:
public class AvroConsumer {
public static void main(String[] args) throws ExecutionException, InterruptedException {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("ZOOKEEPER_HOST", "localhost");
props.put("acks", "all");
props.put("retries", 0);
props.put("group.id", "consumer1");
props.put("auto.offset.reset", "latest");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("schema.registry.url", "http://localhost:8081");
String topic = "confluent-new";
KafkaConsumer<String, GenericRecord> consumer = new KafkaConsumer<String, GenericRecord>(props);
consumer.subscribe(Arrays.asList(topic));
while(true){
ConsumerRecords<String, GenericRecord> recs = consumer.poll(10000);
for (ConsumerRecord<String, GenericRecord> rec : recs) {
System.out.printf("{AvroUtilsConsumerUser}: Recieved [key= %s, value= %s]\n", rec.key(), rec.value());
}
}
}
}
I am unable to see message(data) on the Kafka consumer end. Also I checked the offset count/status for confluent_new topic and its not updating. Seems like Producer code is having some problem.
Any pointer would be helpful.
Meanwhile below Producer code is working and here POJO i.e. User is avro-tools generated POJO.
public class AvroProducer {
public static void main(String[] args) throws ExecutionException, InterruptedException {
Properties props = new Properties();
kafkaParams.put("auto.offset.reset", "smallest");
kafkaParams.put("ZOOKEEPER_HOST", "bihdp01");*/
props.put("bootstrap.servers", "localhost:9092");
props.put("ZOOKEEPER_HOST", "localhost");
props.put("acks", "all");
props.put("retries", 0);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
props.put("schema.registry.url", "http://localhost:8081");
String topic = "confluent-new";
Producer<String, User> producer = new KafkaProducer<String, User>(props);
User user = new User();
user.setUID("0908");
user.setUserName("User data10");
user.setCompany("HCL");
user.setAge(20);
user.setLocation("Noida");
ProducerRecord<String, User> record = new ProducerRecord<String, User>(topic, (String) user.getUID(), user);
producer.send(record).get();
System.out.println("Sent");
}
}
P.S. My requirement is to send the received JSON data from source KAFKA topic to destination KAFKA topic in AVRO format. First I am infering AVRO schema from received JSON data using AVRO4S and registering the schema to SCHEMA REGISTRY. Next is to pull data from received JSON and populate in GenericRecord instance and send this GenericRecord instance to Kafka topic using KafkaAvroSerializer. At consumer end I will use KafkaAvroDeserializer to deserialize the received AVRO data.
In the course of finding solution I tried Thread.sleep(1000) and it fixed my problem. Also I tried producer.send(record).get() and this also fixed the problem. After going through Documentation I came across below code snippet and it hints the solution.
// When you're finished producing records, you can
flush the producer to ensure it has all been `written` to Kafka and
// then close the producer to free its resources.
finally {
producer.flush();
producer.close();
}
This is the best way to fix this problem.
Please try to add get() in first Producer
producer.send(recordData).get();
The producer code which will read a .mp4 video file from disc and sends it to kafka which apparently works since prints "Message sent to the Kafka Topic java_in_use_topic Successfully", but the consumer.poll is empty:
#RestController
#RequestMapping(value = "/javainuse-kafka/")
public class ApacheKafkaWebController {
#GetMapping(value = "/producer")
public String producer(#RequestParam("message") String message) {
Map<String, Object> props = new HashMap<>();
// list of host:port pairs used for establishing the initial connections to the Kakfa cluster
props.put(org.apache.kafka.clients.producer.ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
"localhost:9092");
props.put(org.apache.kafka.clients.producer.ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(org.apache.kafka.clients.producer.ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class.getName());
Producer<String, byte[]> producer = new KafkaProducer<>(props);
Path path = Paths.get("C:/kafka-picture-consumer/SampleVideo_1280x720_1mb.mp4");
ProducerRecord<String, byte[]> record = null;
try {
record = new ProducerRecord<>("topiccc", "keyyyyy", Files.readAllBytes(path));
} catch (IOException e) {
e.printStackTrace();
}
producer.send(record);
producer.close();
return "Message sent to the Kafka Topic java_in_use_topic Successfully";
}
The consumer code which will be used in a servlet:
public class ConsumerService {
public byte[] consumer(){
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("topiccc"));
ConsumerRecords<String, byte[]> records = consumer.poll(100);
System.out.println("ISSSSSSSSSSSSSSSSSSSSSSSSSSSSS EMPTYYYYYYYYYY:"+String.valueOf(records.isEmpty()));
return records.iterator().next().value();
}
}
There can be many possible reasons for this:
Your producer is not sending the message. If this is the case, you can check it by adding callback to your producer and printing the exception. If the exception is null, then the send() is successful.
producer.send(record, (recordMetadata, exception) -> {
System.err.println(exception);
});
Since you are sending an mp4 file, I suppose that you might not have set your Kafka broker configurations and/or topic configurations to support such a large message.
Check the max.message.bytes and message.max.bytes configurations of topic and broker. In this case, you will get RecordTooLargeException
You will have to wait till the producer completely produces the message.
You will need to set auto.offset.reset to earliest in your consumer configurations. This ensures that if no offset data is there for that topic, then it will start consuming from the first message, otherwise it will wait for the next message.
Your poll duration is short, you may need to increase this.
I try to implement a java Kafka consumer. I use Kafka server version 0.9.
It's for test purpose, so all I have to do is to read one message.
public static ConsumerRecords<String, String> readFromKafka() {
ConsumerRecords<String, String> records = null;
try {
Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers", "<KAFKA_SERVER_HOST>:9092");
kafkaProps.put("auto.commit.enable", "false");
kafkaProps.put("value.deserializer", StringDeserializer.class.getName());
kafkaProps.put("key.deserializer", StringDeserializer.class.getName());
kafkaProps.put("client.id", "testScore0");
kafkaProps.put("group.id", "testScore1");
kafkaProps.put("auto.offset.reset", "latest");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(kafkaProps);
consumer.subscribe(Arrays.asList("my_topic"));
records = consumer.poll(0);
} catch (Exception e) {
logger.error("Can not read from kafka", e);
}
return records;
}
The returned records object is empty:
I execute a command-line Kafka consumer on my local machine which connects to the same KAFKA_SERVER_HOST and do get messages.
change the poll time on
records = consumer.poll(0);
for something bigger than 0, try with 100.
records = consumer.poll(100);