I am working on windows platform, a tool in which i will produce messages to kafka from spring boot application very often(A function will produces message continuously). And i will consumes the messages from node.js application. So the application creates so many topics in a day, kafka logs is occupying full disk space within a week. So i tried with log.retention.hours function to delete the logs but i am getting Error while deleting segments, java.nio.file.FileSystemException: The process cannot access the file because it is being used by another process.
NOTE: I haven't get a solution to fix it, i don't know the reason why it is happening. I have two questions
1) Do i need to configure any thing in my application or do i need to send confirmation to kafka server from my spring boot application that i finished producing messages to the topic so that kafka will delete it.
2) how can i connect to kafka server from other machines?(i am hosting kafka, zookeeper in a machine and i am producing messages to a topic from the same machine. now i am trying to consumes messages from other machine but i couldn't connect to the kafka server)
Below are the configuration i am using in the spring boot application and i will produce messages to topics.
#Configuration
public class KafkaProducerConfig {
#Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> configProps = new HashMap<>();
configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
return new DefaultKafkaProducerFactory<>(configProps);
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
}
Related
I have a requirement to process kafka messages in at-least-once fashion. Spring kafka supports async ack starting from 2.8 version.
I am storing received offsets from kafka in a map and after message processing is done committing kafka offsets. This all working fine until i send any error event(poison pill). I am not able to commit bad record inside error handler and due to this kafka is not consuming any new records after encountering any bad/malformed record.
code for kafka listener factory:
public ConcurrentKafkaListenerContainerFactory<String, JsonNode> kafkaListenerContainerFactory(ConsumerFactory<String, JsonNode> kafkaConsumerFactory) {
ConcurrentKafkaListenerContainerFactory<String, JsonNode> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(kafkaConsumerFactory);
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL);
factory.setErrorHandler(errorHandler());
factory.getContainerProperties().setAsyncAcks(true);
return factory;
}
Error Handler Code:
#Bean("errorHandler")
public ErrorHandler errorHandler() {
log.info("Creating error handler");
return (thrownException, records) -> {
log.error("Inside error handler");
};
}
ErrorHandler is marked as Deprecated. Even in CommonErrorHandler I am not able to overcome this issue.
You need to acknowledge all records before the next batch is fetched (even the poison pill).
You will have to ack it in the listener before throwing the exception.
My application needs to send different records to different topics. My application is using the same Kafka cluster. Since the application uses the same Kafka cluster, creating one producer factory is sufficient(Let me know if I need more).
In my mind, I have two options.
Using the same kafkaTemplate for both topics and calling the send method with the topic as below(Kindly assume I used spring default Kafka producer configurations). here we need to pass the topic for each call & we use the same Kafka template for multiple topics.
class ProducerService {
#Autowired
private KafkaTemplate<GenericRecord, GenericRecord> kafkaTemplate;
public void send(String topic, GenericRecord key, GenericRecord value) {
ListenableFuture<SendResult<GenericRecord, GenericRecord>> future = kafkaTemplate.send(topic, key, value);
}
}
Using different Kafka templates for different topics. I want to know Is this setup will increase the performance.
import org.apache.avro.generic.GenericRecord;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.core.ProducerFactory;
#Configuration
public class KafkaConfig {
#Value("kafka.topic.first")
private String firstTopic;
#Value("kafka.topic.second")
private String secondTopic;
#Bean(name = "firstKafkaTemplate")
public KafkaTemplate<GenericRecord, GenericRecord> firstKafkaTemplate(ProducerFactory<GenericRecord, GenericRecord> defaultKafkaProducerFactory) {
KafkaTemplate<GenericRecord, GenericRecord> kafkaTemplate = new KafkaTemplate<>(defaultKafkaProducerFactory);
kafkaTemplate.setDefaultTopic(firstTopic);
return kafkaTemplate;
}
#Bean(name = "secondKafkaTemplate")
public KafkaTemplate<GenericRecord, GenericRecord> secondKafkaTemplate(ProducerFactory<GenericRecord, GenericRecord> defaultKafkaProducerFactory) {
KafkaTemplate<GenericRecord, GenericRecord> kafkaTemplate = new KafkaTemplate<>(defaultKafkaProducerFactory);
kafkaTemplate.setDefaultTopic(secondTopic);
return kafkaTemplate;
}
}
class ProducerService {
#Autowired
#Qualifier("firstKafkaTemplate")
private KafkaTemplate<GenericRecord, GenericRecord> firstTopicTemplate;
#Autowired
#Qualifier("secondKafkaTemplate")
private KafkaTemplate<GenericRecord, GenericRecord> secondTopicTemplate;
public void send(String topic, GenericRecord key, GenericRecord value) {
ListenableFuture<SendResult<GenericRecord, GenericRecord>> future;
if ("first".equalsIgnoreCase(topic)) {
future = firstTopicTemplate.sendDefault(key, value);
} else if ("second".equalsIgnoreCase(topic)) {
future = secondTopicTemplate.sendDefault(key, value);
} else {
throw new RuntimeException("topic is not configured");
}
}
}
Internally, Kafka does batch processing and sends batches to Kafka by a separate thread.
Which way is a better way to send records to gain performance? or there is no difference in performance?
I answer my own question based on the throughput. When I processed the records, I got time-out issue.
Single Producer is efficient in most cases
If you are facing any timeout issues due to queueing records at a much faster rate than they can be sent. Then tweak the below parameters to get rid of the timeout issue. Please note that here I added dummy values. you have to test your application to get the desired values for the application.
spring.kafka.producer.properties.[linger.ms]=100
spring.kafka.producer.properties.[batch.size]=100000
spring.kafka.producer.properties.[request.timeout.ms]=30000
spring.kafka.producer.properties.[delivery.timeout.ms]=200000
request.timeout.ms
how long the producer will wait for a reply from the server when sending data will control by this parameter. If the timeout is reached without reply, the producer will either retry sending or respond with an error (either through exception or the send callback).
linger.ms
linger.ms controls the amount of time to wait for additional messages before sending the current batch. Kafka producer sends a batch of messages either when the current batch is full or when the linger.ms limit is reached. By default, the producer will send messages as soon as there is a sender thread available to send them, even if there’s just one message in the batch. By setting linger.ms higher than 0, we instruct the producer to wait a few milliseconds to add additional messages to the batch before sending it to the brokers. This increases latency but also increases throughput (because we send more messages at once, there is less overhead per message).
batch.size
When multiple records are sent to the same partition, the producer will batch them together. This parameter controls the amount of memory in bytes (not messages!) that will be used for each batch. When the batch is full, all the messages in the batch will be sent. However, this does not mean that the producer will wait for the batch to become full. The producer will send half-full batches and even batches with just a single message in them. Therefore, setting the batch size too large will not cause delays in sending messages; it will just use more memory for the batches. Setting the batch size too small will add some overhead because the producer will need to send messages more frequently.
delivery.timeout.ms
An upper bound on the time to report success or failure after a call to send() returns. This limits the total time that a record will be delayed prior to sending, the time to await acknowledgement from the broker (if expected), and the time allowed for retriable send failures. The producer may report a failure to send a record earlier than this config if either an unrecoverable error is encountered, the retries have been exhausted, or the record is added to a batch that reached an earlier delivery expiration deadline. The value of this config should be greater than or equal to the sum of request.timeout.ms and linger.ms.
If you are still facing the timeout issue, then you need more producers
Increase the producers by increasing threads for same kafka template
To do this, when you are creating the producer factory, you have to enable the below setProducerPerThread to True.
I have added one TaskExecutor to control the number of producers since the number of producers = number of threads
#Configuration
Public class Conf{
#Bean("kafkaTaskExecutor")
public TaskExecutor getKafkaAsyncExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(5);
executor.setMaxPoolSize(15);
executor.setWaitForTasksToCompleteOnShutdown(true);
executor.setThreadNamePrefix("Kafka-Async-");
return executor;
}
#Bean
public KafkaTemplate<GenericRecord, GenericRecord> kafkaTemplate(ProducerFactory<GenericRecord, GenericRecord> producerFactory) {
if (producerFactory instanceof DefaultKafkaProducerFactory<GenericRecord, GenericRecord> defaultFactory) {
defaultFactory.setProducerPerThread(true);
}
return new KafkaTemplate<>(producerFactory);
}
}
Don't change your Kafka code. Let it be the same. We are going to create a new layer to make it work.
class AsyncProducer{
#Autowired
private KafkaProducer producer;
#Value("${topic.name}")
private String topic;
#Autowired
#Qualifier("kafkaTaskExecutor")
private TaskExecutor taskExecutor;
public void sendAsync(GenericRecord key, GenericRecord value){
CompletableFuture.completeFuture(value).thenAcceptAsync( val-> producer.send(topic,key,value), taskExecutor);
}
}
With the above setup, 5 producers will start to send the record initially, when the load is going high it will be increased to 15 producers
Using multiple Kafka Templates
If you thought, you are still not achieving your throughput, then you can try to increase the number of templates. but actually, I didn't try this since I got the desired result with the second approach.
I'm struggling to understand my Kafka consumer behaviours in some integration tests.
I have a Spring boot service which uses a default, autowired KafkaTemplate<String, String> to produce messages to a topic. In my integration tests, I create a KafkaConsumer in each test:
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(
Map.of( ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_CONTAINER.getBootstrapServers(),
ConsumerConfig.GROUP_ID_CONFIG, "test-consumer-group-" + UUID.randomUUID(),
ConsumerConfig.GROUP_INSTANCE_ID_CONFIG, UUID.randomUUID().toString() ),
new StringDeserializer(), new StringDeserializer() );
consumer.subscribe( topics );
return consumer;
with the intent of having a test flow that looks something like:
Create a new consumer for the topics we're testing
Perform action under test which sends messages to some topics
Poll the topics of interest and verify the messages are there
Close consumer
My expectation was that since the default behaviour of a new consumer is to have auto.offset.reset set to latest I would only get messages sent after I create the consumer, which looks fine in this case. However my consumer never receives any messages! I have to set the consumer to earliest - but this is problematic since I don't want messages created by other tests interfering.
The messages don't have any kind of unique identifier on them, which makes consuming the entire topic each time a tricky proposition in terms of test verifications.
I've tried various permutations of auto committing, polling before running the test but after subscribing, manual syncs but nothing seems to work - how can manage my test lifecycle as described above (or is it not possible)?
The kafka instance is managed using TestContainers in case that's relevant.
Can we use Spring Integration to configure directory polling for files such that -
With 2 servers configured, polling occurs on 1 server and corresponding processing get distributed b/w both the servers.
Also, can we switch the polling on either of the servers on runtime ?
Edit -
Tried configuring JBDC MetaStore and run the two instances separately, able to poll and process but getting intermittently DeadLockLoserDataAccessException
Configuration below
#Bean
public MessageChannel fileInputChannel(){
return new DirectChannel();
}
#Bean(PollerMetadata.DEFAULT_POLLER)
public PollerMetadata defaultPoller(){
PollerMetadata pollermetadata = new PollerMetadata();
pollermetadata.setMaxMessagesPerPoll(-1);
pollermetadata.setTrigger(new PeriodicTrigger(1000));
return pollermetadata;
}
#Bean
#InBoundChannelAdapter(value = "fileInputChannel"){
FileReadingMessageSource source = new FileReadingMessageSource();
source.setDirectory("Mylocalpath");
FileSystemPersistentAcceptOnceFileListFilter acceptOnce = new FileSystemPersistentAcceptOnceFileListFilter();
ChainFileListFilter<File> chainFilter = ChainFileListFilter(".*\\.txt"));
chainFilter.addFilter(acceptOnce);
source.setFilter(chainFilter);
source.setUseWatchService(true);
source.setWatchEvents(FileReadingMessageSource.WatchEventType.CREATE,FileReadingMessageSource.WatchEventType.MODIFY);
return source;
}
#Bean
public IntegrationFlow processFileFlow(){
return IntegrationFlows.from("fileInputChannel")
.handle(service).get();
}
It is really one of the features of Spring Integration to easy implement a distributed solution. You just need add a messaging middle ware into your cluster infrastructure and have all the nodes connected to some destination for sending and receiving. A good example could be a SubscribableJmsChannel which you just simple can declare in your application context and all the nodes of your cluster are going to subscribe to this channel for round-robin consumption from JMS queue. It already wouldn't matter which node produces to this channel.
See more in docs: https://docs.spring.io/spring-integration/docs/current/reference/html/jms.html#jms-channel.
Another sample of similar distributed channels are: AMQP, Kafka, Redis, ZeroMQ.
You also can have a shared message store and use it in the QueueChannel definition: https://docs.spring.io/spring-integration/docs/current/reference/html/system-management.html#message-store
It is not clear what you mean about "poller on runtime", so I would suggest you to start a new SO thread with much more info.
See rules as a guidance: https://stackoverflow.com/help/how-to-ask
I'm using the annotation #KafkaListener to consume topics on my application. My issue is that if I create a new topic in kafka but my consumer is already running, it seems the consumer will not pick up the new topic, even if it matches with the topicPattern I'm using. Is there a way to "refresh" the subscribed topics periodically, so that new topics are picked up and rebalanced upon my running consumers?
I'm using Spring Kafka 1.2.2 with Kafka 0.10.2.0.
Regards
You can't dynamically add topics at runtime; you have to stop/start the container to start listening to new topics.
You can #Autowire the KafkaListenerEndpointRegistry and stop/start listeners by id.
You can also stop/start all listeners by calling stop()/start() on the registry itself.
Actually it is possible.
It worked for me with Kafka 1.1.1.
Under the hood Spring uses consumer.subscribe(topicPattern) and now it is totally depends on Kafka lib whether the message will be seen by consumer.
There is consumer config property called metadata.max.age.ms which is 5 mins by default. It basically controls how often client will go to broker for the updates, meaning new topics will not be seen by consumer for up to 5 minutes. You can decrease this value (e.g. 20 seconds) and should see KafkaListener started to pick messages from new topics quicker.
The following way works well for me.
ContainerProperties containerProps = new ContainerProperties("topic1", "topic2");
KafkaMessageListenerContainer<Integer, String> container = createContainer(containerProps);
containerProps.setMessageListener(new MessageListener<Integer, String>() {
#Override
public void onMessage(ConsumerRecord<Integer, String> message) {
logger.info("received: " + message);
}
});
container.setBeanName("testAuto");
container.start();
ref: http://docs.spring.io/spring-kafka/docs/1.0.0.RC1/reference/htmlsingle/
In practical application, I use a ConcurrentMessageListenerContainer instead of single-threaded KafkaMessageListenerContainer.