Message duplication with PubSubInboundChannelAdapter in Spring Boot with Google PubSub - java

What I'm trying to do:
I am developing a Spring Boot application that should act as a subscriber for Google PubSub with "Exactly-once delivery" requirement. I have followed all the guidelines mentioned in the Google Cloud documentation for this:
https://cloud.google.com/pubsub/docs/exactly-once-delivery?_ga=2.207375771.-964146300.1676062320#pubsub_subscriber_exactly_once-java
However, instead of using the PubSub client library directly, I am using Spring Integration.
Problem:
The problem is that when I use the PubSubInboundChannelAdapter in my application, it is duplicating the same message multiple times with the same Id. My configuration class for PubSubInboundChannelAdapter is shown below:
#Slf4j
#Configuration
public class PubSubConfig {
#Value("${values.gcp.pubsub.subscription.name}")
private String subscriptionName;
/**
* This bean enables serialization/deserialization of Java objects to JSON allowing you
* utilize JSON message payloads in Cloud Pub/Sub.
*
* #param objectMapper the object mapper to use
* #return a Jackson message converter
*/
#Bean
public JacksonPubSubMessageConverter jacksonPubSubMessageConverter(ObjectMapper objectMapper) {
return new JacksonPubSubMessageConverter(objectMapper);
}
#Bean
public MessageChannel pubsubInputChannel() {
return new DirectChannel();
}
#Bean
public PubSubInboundChannelAdapter messageChannelAdapter(
#Qualifier("pubsubInputChannel") MessageChannel inputChannel,
PubSubTemplate pubSubTemplate) {
PubSubInboundChannelAdapter adapter =
new PubSubInboundChannelAdapter(pubSubTemplate, subscriptionName);
adapter.setOutputChannel(inputChannel);
adapter.setPayloadType(MyObjectThatNeedBeUnique.class);
adapter.setAckMode(AckMode.AUTO_ACK);
return adapter;
}
}
And, the listener for this is:
#Slf4j
#RequiredArgsConstructor(onConstructor = #__(#Autowired))
#Component
public class CreateVMListener {
#ServiceActivator(inputChannel = "pubsubInputChannel")
public void createVMListener(#Payload MyObjectThatNeedBeUnique payload,
#Header(GcpPubSubHeaders.ORIGINAL_MESSAGE) BasicAcknowledgeablePubsubMessage message)
throws IOException, ExecutionException, InterruptedException, TimeoutException {
log.info("Message arrived! Payload: " + payload.toString() + " | MessageId: " + message.getPubsubMessage().getMessageId() );
// Do some processing that takes 5 min to proccess
}
}
In the application.yml, I have configured the max-ack-extension-period to be 600 seconds and did the same in the Google Cloud PubSub dashboard:
Screenshot of the google cloud pubsub dashboard showing the Exactly once delivery configuration and the Acknowledgement deadline in 600 seconds
Log showing the duplication issue:
2023-02-05 23:22:20.055 [thread1] INFO - Message arrived! Payload: MyObjectThatNeedBeUnique (userId=432) | MessageID: 6846773022764035
2023-02-05 23:22:31.969 [thread2] INFO - Message arrived! Payload: MyObjectThatNeedBeUnique (userId=432) | MessageID: 6846773022764035
2023-02-05 23:23:33.028 [thread3] INFO - Message arrived! Payload: MyObjectThatNeedBeUnique (userId=432) | MessageID: 6846773022764035
2023-02-05 23:24:34.055 [thread4] INFO - Message arrived! Payload: MyObjectThatNeedBeUnique (userId=432) | MessageID: 6846773022764035
For example, in this log, the same message was repeated 4 times in a short period of time, indicating that 4 threads were simultaneously processing the same information.
Questions:
Why is this duplication happening?
What can I do to prevent duplications while retaining the use of the PubSubInboundChannelAdapter (which offers more efficient streaming pull)?
Additional Information: Identifying the Issue
To find the source of the problem, I tried multiple approaches and eventually discovered that the issue was with the PubSubInboundChannelAdapter. I switched to a synchronous option, the PubSubMessageSource (as the code below), which fixed the duplication of messages. However, this solution has a disadvantage as it is synchronous and not as performant as the PubSubInboundChannelAdapter. Because of this I want to know if there is a way to use PubSubInboundChannelAdapter without any duplication problem.
#Bean
#InboundChannelAdapter(channel = "pubsubInputChannel")
public MessageSource<Object> pubsubAdapter(PubSubTemplate pubSubTemplate) {
PubSubMessageSource messageSource = new PubSubMessageSource(pubSubTemplate, createVMSubscriptionName);
messageSource.setAckMode(AckMode.AUTO_ACK);
messageSource.setPayloadType(CreateOrStartVM.class);
messageSource.setBlockOnPull(true);
return messageSource;
}

After consulting with some peers who were not experiencing the issue on their machines, I realized the problem was in the IntelliJ debug configuration.
The reason why the same message is being duplicated multiple times with the same ID is because the PubSubInboundChannelAdapter is a Stream, and in order to make the Reactive Stream Debugger work properly without bugs, you need to enable Hooks.onOperatorDebug() in IntelliJ:
Image showing the Reactive Stream configuration in IntelliJ
After some extra search, I found that the PubSubInboundChannelAdapter uses Reactive Stream in the background, and to debug reactive streams, you need a tool that can visualize and inspect the events flowing through it.

Related

Spring Integration Error/Exception handling

I have started working with Spring Integration to send messages to external System using Spring Integration Google Pub/sub model.
I am sending the payload received by the Service activator as below
#ServiceActivator(inputChannel = "inputChannel")
public void messageReceiver(final String payloadMessage) throws IOException {
adapter.sendData(payloadMessage); // send payloadMessage data to external system, add exception handlers
}
What I want is to implement Exception Handling to the adapter.sendData(payloadMessage) such that I would like to consider varies scenarios like
External System down
Network issues connecting to network from my system to external system.
I have been following the below Google cloud documentation and other online documentation but have
not found sufficient usecase to handle the above scenarios
https://cloud.google.com/pubsub/docs/spring#using-spring-integration-channel-adapters
Considering the above scenarios I would like to implement exception handling such a way that data is not lost when there are exceptions and external systems should receive data even if there are exceptions after some period of time.
I have configured the below error channel. Now there is any error in the sendData() method, I see the same failure messages keeping on loading in the eclipse console. Is there any need to add the param spring.cloud.gcp.pubsub.subscriber.max-ack-extension-period in the yaml
#Bean
public PubSubInboundChannelAdapter messageChannelAdapter(final #Qualifier("myInputChannel") MessageChannel inputChannel,
PubSubTemplate pubSubTemplate)
{
PubSubInboundChannelAdapter adapter = new PubSubInboundChannelAdapter(pubSubTemplate, pubSubSubscriptionName);
adapter.setOutputChannel(inputChannel);
adapter.setAckMode(AckMode.AUTO_ACK);
adapter.setErrorChannelName("pubsubErrors");
return adapter;
}
#ServiceActivator(inputChannel = "pubsubErrors")
public void pubsubErrorHandler(Message<MessagingException> exceptionMessage) {
BasicAcknowledgeablePubsubMessage originalMessage = (BasicAcknowledgeablePubsubMessage) exceptionMessage
.getPayload().getFailedMessage().getHeaders().get(GcpPubSubHeaders.ORIGINAL_MESSAGE);
originalMessage.nack();
}
Sounds like you need some retry and backoff logic around your exceptions.
See more info in docs: https://docs.spring.io/spring-integration/reference/html/messaging-endpoints.html#message-handler-advice-chain.
The #ServiceActivator has that adviceChain attribute for your consideration.

Rabbit MQ + Spring Boot: delay between resend broken messages

I'm creating application using Spring Boot with RabbitMQ.
I've created configuration for Rabbit like this:
#Configuration
public class RabbitConfiguration {
public static final String RESEND_DISPOSAL_QUEUE = "RESEND_DISPOSAL";
#Bean
public Queue resendDisposalQueue() {
return new Queue(RESEND_DISPOSAL_QUEUE, true);
}
#Bean
public SimpleRabbitListenerContainerFactory rabbitListenerContainerFactory (ConnectionFactory connectionFactoryr) {
SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
factory.setConnectionFactory(connectionFactory);
return factory;
}
#Bean
public RabbitTemplate rabbitTemplate(ConnectionFactory connectionFactory){
return new RabbitTemplate(connectionFactory);
}
}
Also I've created listener for Rabbit messages like this:
#RabbitListener(queues = RESEND_DISPOSAL_QUEUE)
public void getResendDisposalPayload(String messageBody){
LOGGER.info("[getResendDisposalPayload] message = {}", messageBody);
// And there is some business logic
}
All works pretty good, but there is one problem.
When I got exception in method getResendDisposalPayload which listens RESEND_DISPOSAL_QUEUE queue (for example temporary problem with database) Rabbit starts resend last not processed message without any delay. It produces a big amount of log and for some reason uncomfortable for my system.
As I've read in this article https://www.baeldung.com/spring-amqp-exponential-backoff "While using a Dead Letter Queue is a standard way to deal with failed messages".
In order to use this pattern I've to create RetryOperationsInterceptor which defines count attempt to deliver message and delay between attempts.
For example:
#Bean
public RetryOperationsInterceptor retryInterceptor() {
return RetryInterceptorBuilder.stateless()
.backOffOptions(1000, 3.0, 10000)
.maxAttempts(3)
.recoverer(messageRecoverer)
.build();
}
It sounds very good but only one problem: I can't define infinity attempt amount in options maxAttempts.
After maxAttempts I have to save somewhere broken message and deal with it in the future. It demands some extra code.
The question is: Is there any way to configure Rabbit to infinity resend broken messages with some delay, say with one second delay?
Rabbit starts resend last not processed message without any delay
That's how redelivery works: it re-push the same message again and again, until you ack it manually or drop altogether. There is no delay in between redeliveries just because an new message is not pulled from the queue until something is done with this one.
I can't define infinity attempt amount in options maxAttempts
Have you tried an Integer.MAX_VALUE? Pretty decent number of attempts.
The other way is to use a Delayed Exchange: https://docs.spring.io/spring-amqp/docs/current/reference/html/#delayed-message-exchange.
You can configure that retry with a RepublishMessageRecoverer to publish into a your original queue back after some attempts are exhausted: https://docs.spring.io/spring-amqp/docs/current/reference/html/#async-listeners

How does Kafka Schema registration happen in Spring Cloud Stream?

I am trying to understand how to use Spring Cloud Streams with the Kafka Binder.
Currently, I am trying to register an AVRO schema with my Confluent Schema Registry and send messages to a topic.
I am unable to understand how the schema registration is being done by Spring Cloud Streams behind the scenes.
Lets take this example from the Spring Cloud Stream samples.
The AVRO schema is located in src/resources/avro
When the mvn:compile goal is run the POJO for the AVRO schema is generated and the producer can post data.
But what I am not able to understand is how Spring Cloud Stream is doing the schema registration to AVRO ?
#Autowired
StreamBridge streamBridge;
#Bean
public Supplier<Sensor> supplier() {
return () -> {
Sensor sensor = new Sensor();
sensor.setId(UUID.randomUUID().toString() + "-v1");
sensor.setAcceleration(random.nextFloat() * 10);
sensor.setVelocity(random.nextFloat() * 100);
sensor.setTemperature(random.nextFloat() * 50);
return sensor;
};
}
#Bean
public Consumer<Sensor> receiveAndForward() {
return s -> streamBridge.send("sensor-out-0", s);
}
#Bean
Consumer<Sensor> receive() {
return s -> System.out.println("Received Sensor: " + s);
}
Is it done when the beans are created ?
Or is it done when the first message is sent ? If so then how does Spring Stream know where to find the .avsc file from ?
Basically what is happening under the hood ?
There seems to be no mention about this is in the docs.
Thanks.
Your serialization strategy (in this case, AVRO) is always handled in the serializers (for producers) and deserializers (for consumers).
You can have Avro (de)serialized keys and/or Avro (de)serialized values. Which means one should pass in KafkaAvroSerializer.class/KafkaAvroDeserializer.class to the producer/consumer configs, respectively. On top of this, one must pass in the schema.registry.url to the clients config as well.
So behind the scenes, spring cloud stream makes your application avro compatible when it creates your producers/consumers (using the configs found in application.properties or else where). Your clients will connect to the schema registry (logs will tell you if failed to connect) on start up, but does not do any schema registration out of the box.
Schema registration is done on the first message that gets sent. If you haven't already, you'll see that the generated POJOs contain the schemas already, so spring cloud stream doesn't need the .avsc files at all. For example, my last generated Avro pojo contained (line 4) :
#org.apache.avro.specific.AvroGenerated
public class AvroBalanceMessage extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
private static final long serialVersionUID = -539731109258473824L;
public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse({\"type\":\"record\",\"name\":\"AvroBalanceMessage\",\"namespace\":\"tech.nermindedovic\",\"fields\"[{\"name\":\"accountNumber\",\"type\":\"long\",\"default\":0},{\"name\":\"routingNumber\",\"type\":\"long\",\"default\":0},{\"name\":\"balance\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"default\":\"0.00\"},{\"name\":\"errors\",\"type\":\"boolean\",\"default\":false}]}");
public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
.......
When producers send this pojo, it communicates to the registry about the current version of the schema. If the schema is not in the registry, then the registry will store it and identify it by ID. The producer sends the message with its schema ID to the Kafka broker. On the other hand, the consumer will get this message and check if its seen the ID (stored in cache so you don't always have to retrieve the schema from the Registry) and if it hasnt, it will communicate with the registry to get such information about the message.
A bit outside of the scope of spring cloud stream, but one can also use the REST API for SR to manually register schemas.

spring integration inboundChannelAdapter produce more than one message at a time?

I tried to define an InboundChannelAdapter to read messages from a queue API (azure in this case). The native approach looks like this:
#Bean
#InboundChannelAdapter(value = "myChannelExample",
poller = #Poller(fixedDelay = "1000",
maxMessagesPerPoll = "1"))
public MessageSource<QueueMessage> queueReadingMessageSource() {
return wrapMessage(queueClient.readMessage())
}
This works as expected - but I was wondering it there is a more efficient way to define an adapter which would be able to read multiple (maxMessagesPerPoll>1) messages at once from the message source? Is there a messageSource interface which allows returning a list of messages?
You can simply return a message with a List<QueueSource> payload and add a splitter downstream.

RabbitMQ + Spring RabbitTemplate: timeout for "convertAndSend"

I've got a fairly simple code that uses Spring's RabbitTemplate to send messages to RabbitMQ. The code is not interested in receiving messages, it is a simple fire and forget scenario.
rabbitTemplate.convertAndSend(exchange, routingKey, payload);
The template is created like this (please note that I'm using a transacted channel):
#Bean
public RabbitTemplate rabbitTemplate() {
val rabbitTemplate = new RabbitTemplate(connectionFactory());
rabbitTemplate.setChannelTransacted(true);
rabbitTemplate.setMessageConverter(jsonConverter());
return rabbitTemplate;
}
I've faced an issue when the RabbitMQ server was overloaded and this call hanged for a long time and never timed out. The connection itself did not die, but RabbitMQ server had nearly full RAM and 100% CPU usage, so it wasn't responsive.
Is there a way to configure either Spring's RabbitTemplate or the underlying AmqpTemplate to time out on a simple send if it blocks for too long?

Categories