Note to duplicate markers: I DID check out the other question, but it does not answer my specific question below.
So imagine I have a Kafka topic on a single server with only one partition. So it is much similar to a queue.
Now lets assume I want 100 listeners waiting to accept values from the queue.
So by design, if all 100 consumers are in a single group, the contents from the log (or queue here) will be distributed among the consumers. So the operation will be over in 1/100th of the time.
The problem is that the Spring Kafka listener is only configured with the topic name.
#Service
public class Consumer {
#KafkaListener(topics = "${app.topic}")
public void receive(#Payload String message,
#Headers MessageHeaders headers) {
System.out.println("Received message="+message);
headers.keySet().forEach(key -> System.out.println(key+"->"+headers.get(key)));
}
}
I can seem to get Kafka to spawn up a 100 consumers for processing messages from the "queue" (logs).
How can it be done?
Check out this answer for an understanding of Kafka consumers In Apache Kafka why can't there be more consumer instances than partitions?
To properly distribute messages amongst a single consumer group you must have more than one partition. Once you find the correct partition amount for your load I would use spring cloud streaming to better manage your concurrency and consumer group assignment.
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-stream-kafka</artifactId>
</dependency>
Sample of sink
#SpringBootApplication
#EnableBinding(Sink.class)
public class LoggingConsumerApplication {
public static void main(String[] args) {
SpringApplication.run(LoggingConsumerApplication.class, args);
}
#StreamListener(Sink.INPUT)
public void handle(Person person) {
System.out.println("Received: " + person);
}
public static class Person {
private String name;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String toString() {
return this.name;
}
}
}
Concurrency settings
cloud:
stream:
bindings:
input:
destination: <topic-name>
group: <consumer-group>
consumer:
headerMode: raw
partitioned: true
concurrency: 20
More information available here https://cloud.spring.io/spring-cloud-stream/
Related
I'm new with Kafka and want to persist data from kafka topics to database tables (each topic flow to a specific table). I know Kafka connect exists and can be used to achieve this but there are reasons why this approach is preferred.
Unfortunately only one topic is writing the database. Kafka seems to not process() all processors concurrently. Either MyFirstData is writing to database or MySecondData but never but at the same time.
According the my readings the is the option overriding init() from of kafka stream Processor interface which offers context.forward() not sure if this will help and how to use it in my used case.
I use Spring Cloud Stream (but got the same behaviour with Kafka DSL and Processor API implementations)
My code snippet:
Configuring the consumers:
#Configuration
#RequiredArgsConstructor
public class DatabaseProcessorConfiguration {
private final MyFirstDao myFirstDao;
private final MySecondDao mySecondDao;
#Bean
public Consumer<KStream<GenericData.Record, GenericData.Record>> myFirstDbProcessor() {
return stream -> stream.process(() -> {
return new MyFirstDbProcessor(myFirstDao);
});
}
#Bean
public Consumer<KStream<GenericRecord, GenericRecord>> mySecondDbProcessor() {
return stream -> stream.process(() -> new MySecondDbProcessor(mySecondDao));
}
}
This MyFirstDbProcessor and MySecondDbProcessor is analog to this.
#Slf4j
#RequiredArgsConstructor
public class MyFirstDbProcessor implements Processor<GenericData.Record, GenericData.Record, Void, Void> {
private final MyFirstDao myFirstDao;
#Override
public void process(Record<GenericData.Record, GenericData.Record> record) {
CdcRecordAdapter adapter = new CdcRecordAdapter(record.key(), record.value());
MyFirstTopicKey myFirstTopicKey = adapter.getKeyAs(MyFirstTopicKey.class);
MyFirstTopicValue myFirstTopicValue = adapter.getValueAs(MyFirstTopicValue.class);
MyFirstData data = PersistenceMapper.map(myFirstTopicKey, myFirstTopicValue);
switch (myFirstTopicValue.getCrudOperation()) {
case UPDATE, INSERT -> myFirstDao.persist(data);
case DELETE -> myFirstDao.delete(data);
default -> System.err.println("unimplemented CDC operation streamed by kafka");
}
}
}
My Dao implementations: I try an implementation of MyFirstRepository with JPARepository and ReactiveCrudRepository but same behaviour. MySecondRepository is implemented analog to MyFirstRepository.
#Component
#RequiredArgsConstructor
public class MyFirstDaoImpl implements MyFirstDao {
private final MyFirstRepository myFirstRepository;
#Override
public MyFirstData persist(MyFirstData myFirstData) {
Optional<MyFirstData> dataOptional = MyFirstRepository.findById(myFirstData.getId());
if (dataOptional.isPresent()){
var data = dataOptional.get();
myFirstData.setCreatedDate(data.getCreatedDate());
}
return myFirstRepository.save(myFirstData);
}
#Override
public void delete(MyFirstData myFirstData) {
System.out.println("delete() from transaction detail dao called");
MyFirstRepository.delete(myFirstData);
}
}
I currently have implemented in a Spring Boot project running on Fargate an SQS listener.
It's possible that under the hood, the SqsAsyncClient which appears to be a listener, is actually polling though.
Separately, as a PoC, on I implemented a Lambda function trigger on a different queue. This would be invoked when there are items in the queue and would post to my service. This seems unnecessarily complex to me but removes a single point of failure if I were to only have one instance of the service.
I guess my major point of confusion is whether I am needlessly worrying about polling vs listening on a SQS queue and whether it matters.
Code for example purposes:
#Component
#Slf4j
#RequiredArgsConstructor
public class SqsListener {
private final SqsAsyncClient sqsAsyncClient;
private final Environment environment;
private final SmsMessagingServiceImpl smsMessagingService;
#PostConstruct
public void continuousListener() {
String queueUrl = environment.getProperty("aws.sqs.sms.queueUrl");
Mono<ReceiveMessageResponse> responseMono = receiveMessage(queueUrl);
Flux<Message> messages = getItems(responseMono);
messages.subscribe(message -> disposeOfFlux(message, queueUrl));
}
protected Flux<Message> getItems(Mono<ReceiveMessageResponse> responseMono) {
return responseMono.repeat().retry()
.map(ReceiveMessageResponse::messages)
.map(Flux::fromIterable)
.flatMap(messageFlux -> messageFlux);
}
protected void disposeOfFlux(Message message, String queueUrl) {
log.info("Inbound SMS Received from SQS with MessageId: {}", message.messageId());
if (someConditionIsMet())
deleteMessage(queueUrl, message);
}
protected Mono<ReceiveMessageResponse> receiveMessage(String queueUrl) {
return Mono.fromFuture(() -> sqsAsyncClient.receiveMessage(
ReceiveMessageRequest.builder()
.maxNumberOfMessages(5)
.messageAttributeNames("All")
.queueUrl(queueUrl)
.waitTimeSeconds(10)
.visibilityTimeout(30)
.build()));
}
protected void deleteMessage(String queueUrl, Message message) {
sqsAsyncClient.deleteMessage(DeleteMessageRequest.builder()
.queueUrl(queueUrl)
.receiptHandle(message.receiptHandle())
.build())
.thenAccept(deleteMessageResponse -> log.info("deleted message with handle {}", message.receiptHandle()));
}
}
I'm using Spring-Kakfa to connect kakfa cluster and I have a requirement to execute a piece of code after all the topic partitions of all the topics are assigned.
take the following code for example
#Component
public class MyKafkaMessageListener1 {
#KafkaListener(topics = "topic1" , groupId = "cg1")
public void handleMessage(String message){
System.out.println("Msg:"+message);
}
}
#Component
public class MyKafkaMessageListener2 {
#KafkaListener(topics = "topic2, topic3" , groupId = "cg2")
public void handleMessage(String message){
System.out.println("Msg:"+message);
}
}
I need to execute after all topic-partitions of topic1, topic2 and topic3 are assigned to threads. Is it possible? Is there such a EventHandler in kafka or spring-kafka?
Implement ConsumerSeekAware (or extend AbstractConsumerSeekAware), or add a ConsumerRebalanceListener and you will receive a call to onPartitionsAssigned().
I have a "task" application that is short lived and produces messages to Kafka based on statuses from a database. I'm using spring cloud stream to produce the messages using the below format of my application. I followed this format from the Spring Cloud Stream documentation to send arbitrary data to the output binding.
private EmitterProcessor<Message<GenericRecord>> processor;
#Override
public void run(ApplicationArguments arg0) {
// ... create Message<GenericRecord> producerRecord
this.processor.onNext(producerRecord);
}
#Bean
public Supplier<Flux<Message<GenericRecord>>> supplier() {
return () -> this.processor;
}
public static void main(String[] args) {
ConfigurableApplicationContext ctx = SpringApplication.run(Application.class, args);
ctx.close();
}
The application runs, creates the records, runs onNext(), and then exits. I then look to see if any messages have been published but there are none on the topic. I then added a Thread.sleep(10000) after each message is produced and the messages end up on the topic.
After looking at the documentation for Reactor I didn't seen any clear ways to accomplish this. Is there a way to wait for the EmitterProcessor to finish publishing the messages before the Spring application exits?
Do you have a specific reason to use the EmittorProcessor? I think this use-case can be solved by using StreamBridge. For e.g.
#Autowired StreamBridge streamBridge;
#Override
public void run(ApplicationArguments arg0) {
// ... create Message<GenericRecord> producerRecord
this.streamBridge.send("process-out-0", producerRecord);
}
Then provide configuration for: spring.cloud.stream.source: process
You can find more details on StreamBridge in the ref docs.
We have an application which will be communicating with different services over kafka. For example, I need to have 3 consumers (most likely with the same groupId) & 3 producers, each reading from and wrting to different topics. I would like to make use of the KafkaProperties class to do this.
Listener
#Component
#RequiredArgsConstructor // lombok
public class MyKafkaListener {
#NonNull
private final EventProcessingService eventProcessingService;
#NonNull
private final KafkaProperties kafkaProperties;
#KafkaListener(topics = #{__kafkaProperties.???})
public void listenMessageA(final ConsumerRecord<String, MessageA> consumerRecord) {
// Delegate to eventProcessingService
}
#KafkaListener(topics = #{__kafkaProperties.???})
public void listenMessageB(final ConsumerRecord<String, MessageB> consumerRecord) {
// Delegate to eventProcessingService
}
#KafkaListener(topics = #{__kafkaProperties.???})
public void listenMessageC(final ConsumerRecord<String, MessageC> consumerRecord) {
// Delegate to eventProcessingService
}
}
Publisher
#Component
#RequiredArgsConstructor // lombok
public class MyKafkaPublisher<T> {
#NonNull
private final KafkaTemplate<String, T> kafkaTemplate;
#NonNull
private final KafkaProperties kafkaProperties;
public void sendMessageX(final MessageX messageX) {
kafkaTemplate.send(kafkaProperties.???, messageX);
}
public void sendMessageY(final MessageY messageY) {
kafkaTemplate.send(kafkaProperties.???, messageY);
}
public void sendMessageZ(final MessageZ messageZ) {
kafkaTemplate.send(kafkaProperties.???, messageZ);
}
}
A snippet from application.yml
spring:
kafka:
bootstrap-servers: localhost:9092
consumer:
groupId: myGroupId
properties:
someConsumerProp: someValue
# Should I add my consumer topic names here?
topicForA: myFavTopicA
topicForB: myFavTopicB
topicForC: myFavTopicC
producer:
retries: 10
properties:
someProducerProp: someValue
# Should I add my producer topic names here?
topicForX: myFavTopicX
topicForY: myFavTopicY
topicForZ: myFavTopicZ
properties:
someCommonProp: someValue
# Or may be all topic names here?
listener.topicForA: myFavTopicA
listener.topicForB: myFavTopicB
listener.topicForC: myFavTopicC
publisher.topicForX: myFavTopicX
publisher.topicForY: myFavTopicY
publisher.topicForZ: myFavTopicZ
template:
# Bonus question: What is the usage of this property?
default-topic: myDefTopic
I want to know what would be the best way to replace the ??? in above classes as suggested by the authors of spring-kafka so that I don't have to write any extra #ConfigurationProperties class or use #Value anywhere but at the same time keep the idea of topic name clear by reading the application.yml itself. Or is there any different way provided by the authors to address such scenario?
The Boot team does not consider AutoConfiguration Properties classes as public and can change at any time so beware using them in application code.
Overloading the kafka consumer/producer/admin properties like that is a bit dirty (and Kafka itself might complain about "unknown" properties in its configuration).
It would be better to create your own #ConfigurationProperties class.
To specifically answer your question, see buildProducerProperties().