Message transfer in between two topics in google cloud pub sub - java

We have a use case where on any action from UI we need to read messages from google pub/sub Topic A synchronously and move those messages to Topic B.
Below is the code that has been written to handle this behavior and this is from Google Pub Sub docs to access a Topic synchronusly.
public static int subscribeSync(String projectId, String subscriptionId, Integer numOfMessages, int count, String acknowledgementTopic) throws IOException {
SubscriberStubSettings subscriberStubSettings =
SubscriberStubSettings.newBuilder()
.setTransportChannelProvider(
SubscriberStubSettings.defaultGrpcTransportProviderBuilder()
.setMaxInboundMessageSize(20 * 1024 * 1024) // 20MB (maximum message size).
.build())
.build();
try (SubscriberStub subscriber = GrpcSubscriberStub.create(subscriberStubSettings)) {
String subscriptionName = ProjectSubscriptionName.format(projectId, subscriptionId);
PullRequest pullRequest =
PullRequest.newBuilder()
.setMaxMessages(numOfMessages)
.setSubscription(subscriptionName)
.build();
// Use pullCallable().futureCall to asynchronously perform this operation.
PullResponse pullResponse = subscriber.pullCallable().call(pullRequest);
List<String> ackIds = new ArrayList<>();
for (ReceivedMessage message : pullResponse.getReceivedMessagesList()) {
// START - CODE TO PUBLISH MESSAGE TO TOPIC B
**publishMessage(message.getMessage(),acknowledgementTopic,projectId);**
// END - CODE TO PUBLISH MESSAGE TO TOPIC B
ackIds.add(message.getAckId());
}
// Acknowledge received messages.
AcknowledgeRequest acknowledgeRequest =
AcknowledgeRequest.newBuilder()
.setSubscription(subscriptionName)
.addAllAckIds(ackIds)
.build();
// Use acknowledgeCallable().futureCall to asynchronously perform this operation.
subscriber.acknowledgeCallable().call(acknowledgeRequest);
count=pullResponse.getReceivedMessagesList().size();
}catch(Exception e) {
log.error(e.getMessage());
}
return count;
}
Below is the sample code to publish messages to Topic B
public static void publishMessage(PubsubMessage pubsubMessage,String Topic,String projectId) {
Publisher publisher = null;
ProjectTopicName topicName =ProjectTopicName.newBuilder().setProject(projectId).setTopic(Topic).build();
try {
// Publish the messages to normal topic.
publisher = Publisher.newBuilder(topicName).build();
} catch (IOException e) {
log.error(e.getMessage());
}
publisher.publish(pubsubMessage);
}
Is this the right way of handling this use case or this can be handled in someother way. We do not want to use Cloud Dataflow. Can someone let us know if this is fine or there is an issue.
The code works but sometimes messages stay on Topic A even after hey are consumed synchronously.
Thanks'

There are some issues with the code as presented.
You should really only use synchronous pull if there are specific reasons why you need to do so. In general, it is much better to use asynchronous pull via the client libraries. It will be more efficient and reduce the latency of moving messages from one topic to the other. You do not show how you call subscribeSync, but in order to process messages efficiently and ensure that you actually process all messages, you'd need to be calling it many times in parallel continuously. If you are going to stick with synchronous pull, then you should reuse the SubscriberStub object as recreating it for every call will be inefficient.
You don't reuse your Publisher object. As a result, you are not able to take advantage of the batching that the publisher client can do. You should create the Publisher once and reuse it across your calls for publishes to the same topic. If the passed-in topic can differ across messages, then keep a map from topic to publisher and retrieve the right one from the map.
You don't wait for the result of the call to publish. It is possible that this call fails, but you do not handle that failure. As a result, you could acknowledge the message on the first topic without it having actually been published, resulting in message loss.
With regard to your question about duplicates, Pub/Sub offers at-least-once delivery guarantees, so even with proper acking, it is still possible to receive messages again (typical duplicate rates are around 0.1%). There can be many different reasons for duplicates. In your case, since you are processing messages sequentially and recreating a publisher for every call, it could be that later messages are not acked before the ack deadline expires, which results in redelivery.

Related

Spring Cloud Stream - notice and handle errors in broker

I am fairly new to developing distributed applications with messaging, and to Spring Cloud Stream in particular. I am currently wondering about best practices on how to deal with errors on the broker side.
In our application, we need to both consume and produce messages from/to multiple sources/destinations like this:
Consumer side
For consuming, we have defined multiple #Beans of type java.util.function.Consumer. The configuration for those looks like this:
spring.cloud.stream.bindings.consumeA-in-0.destination=inputA
spring.cloud.stream.bindings.consumeA-in-0.group=$Default
spring.cloud.stream.bindings.consumeB-in-0.destination=inputB
spring.cloud.stream.bindings.consumeB-in-0.group=$Default
This part works quite well - wenn starting the application, the exchanges "inputA" and "inputB" as well as the queues "inputA.$Default" and "inputB.$Default" with corresponding binding are automatically created in RabbitMQ.
Also, in case of an error (e.g. a queue is suddenly not available), the application gets notified immediately with a QueuesNotAvailableException and continuously tries to re-establish the connection.
My only question here is: Is there some way to handle this exception in code? Or, what are best practices to deal with failures like this on broker side?
Producer side
This one is more problematic. Producing messages is triggered by some internal logic, we cannot use function #Beans here. Instead, we currently rely on StreamBridge to send messages. The problem is that this approach does not trigger creation of exchanges and queues on startup. So when our code calls streamBridge.send("outputA", message), the message is sent (result is true), but it just disappears into the void since RabbitMQ automatically drops unroutable messages.
I found that with this configuration, I can at least get RabbitMQ to create exchanges and queues as soon as the first message is sent:
spring.cloud.stream.source=produceA;produceB
spring.cloud.stream.default.producer.requiredGroups=$Default
spring.cloud.stream.bindings.produceA-out-0.destination=outputA
spring.cloud.stream.bindings.produceB-out-0.destination=outputB
I need to use streamBridge.send("produceA-out-0", message) in code to make it work, which is not too great since it means having explicit configuration hardcoded, but at least it works.
I also tried to implement the producer in a Reactor style as desribed in this answer, but in this case the exchange/queue also is not created on application startup and the sent message just disappears even though the return status of the sending method is "OK".
Failures on the broker side are not registered at all with this approach - when I simulate one e.g. by deleting the queue or the exchange, it is not registered by the application. Only when another message is sent, I get in the logs:
ERROR 21804 --- [127.0.0.1:32404] o.s.a.r.c.CachingConnectionFactory : Shutdown Signal: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'produceA-out-0' in vhost '/', class-id=60, method-id=40)
But still, the result of StreamBridge#send was true in this case. But we need to know that sending did actually fail at this point (we persist the state of the sent object using this boolean return value). Is there any way to accomplish that?
Any other suggestions on how to make this producer scenario more robust? Best practices?
EDIT
I found an interesting solution to the producer problem using correlations:
...
CorrelationData correlation = new CorrelationData(UUID.randomUUID().toString());
messageHeaderAccessor.setHeader(AmqpHeaders.PUBLISH_CONFIRM_CORRELATION, correlation);
Message<String> message = MessageBuilder.createMessage(payload, messageHeaderAccessor.getMessageHeaders());
boolean sent = streamBridge.send(channel, message);
try {
final CorrelationData.Confirm confirm = correlation.getFuture().get(30, TimeUnit.SECONDS);
if (correlation.getReturned() == null && confirm.isAck()) {
// success logic
} else {
// failed logic
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
// failed logic
} catch (ExecutionException | TimeoutException e) {
// failed logic
}
using these additional configurations:
spring.cloud.stream.rabbit.default.producer.useConfirmHeader=true
spring.rabbitmq.publisher-confirm-type=correlated
spring.rabbitmq.publisher-returns=true
This seems to work quite well, although I'm still clueless about the return value of StreamBridge#send, it is always true and I cannot find information in which cases it would be false. But the rest is fine, I can get information on issues with the exchange or the queue from the correlation or the confirm.
But this solution is very much focused on RabbitMQ, which causes two problems:
our application should be able to connect to different brokers (e.g. Azure Service Bus)
in tests we use Kafka binder and I don't know how to configure the application context to make it work in this case, too
Any help would be appreciated.
On the consumer side, you can listen for an event such as the ListenerContainerConsumerFailedEvent.
https://docs.spring.io/spring-amqp/docs/current/reference/html/#consumer-events
On the producer side, producers only know about exchanges, not any queues bound to them; hence the requiredGroups property which causes the queue to be bound.
You only need spring.cloud.stream.default.producer.requiredGroups=$Default - you can send to arbitrary destinations using the StreamBridge and the infrastructure will be created.
#SpringBootApplication
public class So70769305Application {
public static void main(String[] args) {
SpringApplication.run(So70769305Application.class, args);
}
#Bean
ApplicationRunner runner(StreamBridge bridge) {
return args -> bridge.send("foo", "test");
}
}
spring.cloud.stream.default.producer.requiredGroups=$Default

How to halt consuming messages in an AMQP for a certain period of time (where time is not fixed)?

I am trying to achieve the following scenario in my application:
When my application is up, the message from the incoming exchange should be consumed by the incoming queue.
If any exception/error occurs, the messages are directed to the DeadLetter Queue.
When downtime is going on for my application (I don't want to consume messages during that time), I am redirecting the messages to the ParkingLot Queue.
When downtime is over, I want to first consume the message from the ParkingLot Queue, and then start consuming messages normally using Incoming Queue.
My question is: Can these scenarios be implemented? Here, mainly I am talking about step 4. If yes, can someone please point me in the correct direction?
My second question is: Is it the correct way to achieve this scenario? Or is there a better way to achieve it?
Code added:
#RabbitListener(queues = "${com.rabbitmq.queueName}", id="msgId")
#RabbitListener(queues = "${com.rabbitmq.parkingQueueName}", id="parkingId")
public void consumeMessage(Message message) {
try {
log.info("Received message: {}",new String(message.getBody()));
//check if the application is down
if(val) {
registry.getListenerContainer("msgId").stop();
rabbitTemplate.send(rabbitMQConfig.getExchange(), rabbitMQConfig.getParkingRoutingKey(), message);
}
}catch(Exception e) {
rabbitTemplate.send(rabbitMQConfig.getDeadLetterExchange(), rabbitMQConfig.getDeadLetterRoutingKey(), message);
}
}
Give each #RabbitListener an id attribute.
Then use the RabbitListenerEndpointRegistry bean to control the containers' lifecycles.
registry.getListenerContainer(id).stop();
and
registry.getListenerContainer(id).start();
You can put both #RabbitListener annotations on the same method.

Ack pubSub message outside of the MessageReciever

I am using async Pull to pull messages from a pupSub topic, do some processing and send messages to ActiveMQ topic.
With the current configuration of pupSub I have to ack() the messages upon recieval. This however, does not suit my use case, as I need to ONLY ack() messages after they are successfully processed and sent to the other Topic. this means (per my understanding) ack()ing the messages outside the messageReciver.
I tried to save the each message and its AckReplyConsumer to be able to call it later and ack() the messages, this however does not work as expected. and not all messages are correctly ack() ed.
So I want to know if this is possible at all. and if Yes how
my subscriber configs
public Subscriber getSubscriber(CompositeConfigurationElement compositeConfigurationElement, Queue<CustomPupSubMessage> messages) throws IOException {
ProjectSubscriptionName subscriptionName = ProjectSubscriptionName.of(compositeConfigurationElement.getPubsub().getProjectid(),
compositeConfigurationElement.getSubscriber().getSubscriptionId());
ExecutorProvider executorProvider =
InstantiatingExecutorProvider.newBuilder().setExecutorThreadCount(2).build();
// Instantiate an asynchronous message receiver.
MessageReceiver receiver =
(PubsubMessage message, AckReplyConsumer consumer) -> {
messages.add(CustomPupSubMessage.builder().message(message).consumer(consumer).build());
};
// The subscriber will pause the message stream and stop receiving more messages from the
// server if any one of the conditions is met.
FlowControlSettings flowControlSettings =
FlowControlSettings.newBuilder()
// 1,000 outstanding messages. Must be >0. It controls the maximum number of messages
// the subscriber receives before pausing the message stream.
.setMaxOutstandingElementCount(compositeConfigurationElement.getSubscriber().getOutstandingElementCount())
// 100 MiB. Must be >0. It controls the maximum size of messages the subscriber
// receives before pausing the message stream.
.setMaxOutstandingRequestBytes(100L * 1024L * 1024L)
.build();
//read credentials
InputStream input = new FileInputStream(compositeConfigurationElement.getPubsub().getSecret());
CredentialsProvider credentialsProvider = FixedCredentialsProvider.create(ServiceAccountCredentials.fromStream(input));
Subscriber subscriber = Subscriber.newBuilder(subscriptionName, receiver)
.setParallelPullCount(compositeConfigurationElement.getSubscriber().getSubscriptionParallelThreads())
.setFlowControlSettings(flowControlSettings)
.setCredentialsProvider(credentialsProvider)
.setExecutorProvider(executorProvider)
.build();
return subscriber;
}
my processing part
jmsConnection.start();
for (int i = 0; i < patchSize; i++) {
var message = messages.poll();
if (message != null) {
byte[] payload = message.getMessage().getData().toByteArray();
jmsMessage = jmsSession.createBytesMessage();
jmsMessage.writeBytes(payload);
jmsMessage.setJMSMessageID(message.getMessage().getMessageId());
producer.send(jmsMessage);
list.add(message.getConsumer());
} else break;
}
jmsSession.commit();
jmsSession.close();
jmsConnection.close();
// if upload is successful then ack the messages
log.info("sent " + list.size() + " in direction " + dest);
list.forEach(consumer -> consumer.ack());
There is nothing that requires messages to be acked within the MessageReceiver callback and you should be able to acknowledge messages asynchronously. There are a few things to keep in mind and look for:
Check to ensure that you are calling ack before the ack deadline expires. By default, the Java client library does extend the ack deadline for up to 1 hour, so if you are taking less time than that to process, you should be okay.
If your subscriber is often flow controlled, consider reducing the value you pass into setParallelPullCount to 1. The flow control settings you pass in are passed to each stream, not divided among them, so if each stream is able to receive the full value passed in and your processing is slow enough, you could be exceeding the 1-hour deadline in the client library without having even received the message yet, causing the duplicate delivery. You really only need to use setParallelPullCount to a larger value if you are able to process messages much faster than a single stream can deliver them.
Ensure that your client library version is at least 1.109.0. There were some improvements made to the way flow control was done in that version.
Note that Pub/Sub has at-least-once delivery semantics, meaning messages can be redelivered, even if ack is called properly. Note that not acknowledging or nacking a single message could result in the redelivery of all messages that were published together in a single batch. See the "Message Redelivery & Duplication Rate
" section of "Fine-tuning Pub/Sub performance with batch and flow control settings."
If all of that still doesn't fix the issue, then it would be best to try to create a small, self-contained example that reproduces the issue and open up a bug in the GitHub repo.

How to consume 100 messages without ack, then do work and then acknowledge them?

I'm consuming messages from a rabbitmq with spring amqp.
I'm consuming one message at a time and its pretty slow cause I save it to DB. So opening and closing transactions every time.
Right now I've set up a consumer like this.
#RabbitListener(queues = "queuename")
public void receive(Message message) {
someservice.saveToDb(message);
}
But this is really slow. I would like to consume a bunch of messages before I start saving them. Then I can open a transaction. Save 300 and then commit and load the next batch.
Would something like this work?
class MessageChannelTag {
Message message;
Channel channel;
long tag;
}
#Component
class ConsumerClass {
List<MessageChannelTag> messagesToSave = new ArrayList<>();
#RabbitListener(queues = "queuename")
public void receive(Message message, Channel channel, #Header(AmqpHeaders.DELIVERY_TAG) long tag)
throws IOException {
messagesToSave.add(new MessageChannelTag(message, channel, tag));
}
#Scheduled(fixedDelay=500)
public void saveMessagesToDb() {
List saveTheese = new ArrayList(messagesToSave);
messagesToSave.clear();
service.saveMessages(saveTheese);
for(MessageChannelTag messageChannelTag:messagesToSave) {
//In the service I could mark the rows if save succeded or not and
//then out here I could ack or nack..
messageChannelTag.getChannel().basicAck(messageChannelTag.getTag(), false);
}
}
}
Or if there is a simpler solution let me know. I prefer fast, simple and robust =)
It might also be worth investigating if an "upstream" producer can provide batches of messages instead of individual ones.
Don't use the "pull" API (basic.get), it is not nearly as efficient as consuming messages.
Set prefetch (also known as QoS) to 300, then acknowledge the messages at once when you are done. I am not familiar with Spring but I'm certain there are decorators or other ways to accomplish this.
This is all covered in the docs and tutorials - https://www.rabbitmq.com/consumer-prefetch.html
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

RabbitMQ. Java client. Is it possible to acknowledge message not on the same thread it was received?

I want to fetch several messages, handle them and ack them all together after that. So basically I receive a message, put it in some queue and continue receiving messages from rabbit. Different thread will monitor this queue with received messages and process them when amount is sufficient. All I've been able to found about ack contains examples only for one message which processed on the same thread. Like this(from official docs):
channel.basicQos(1);
final Consumer consumer = new DefaultConsumer(channel) {
#Override
public void handleDelivery(String consumerTag, Envelope envelope, AMQP.BasicProperties properties, byte[] body) throws IOException {
String message = new String(body, "UTF-8");
System.out.println(" [x] Received '" + message + "'");
try {
doWork(message);
} finally {
System.out.println(" [x] Done");
channel.basicAck(envelope.getDeliveryTag(), false);
}
}
};
And also documentation says this:
Channel instances must not be shared between threads. Applications
should prefer using a Channel per thread instead of sharing the same
Channel across multiple threads. While some operations on channels are
safe to invoke concurrently, some are not and will result in incorrect
frame interleaving on the wire.
So I'm confused here. If I'm acking some message and at the same time the channel is receiving another message from rabbit, is it considered to be two operations at the time? It seems to me like yes.
I've tried to acknowledge message on the same channel from different thread and it seems to work, but documentation says that I should not share channels between threads. So I've tried to do acknowledgment on different thread with different channel, but it fails, because delivery tag is unknown for this channel.
Is it possible to acknowledge message not on the same thread it was received?
UPD
Example piece of code of what I want. It's in scala, but I think it's straightforward.
case class AmqpMessage(envelope: Envelope, msgBody: String)
val queue = new ArrayBlockingQueue[AmqpMessage](100)
val consumeChannel = connection.createChannel()
consumeChannel.queueDeclare(queueName, true, false, true, null)
consumeChannel.basicConsume(queueName, false, new DefaultConsumer(consumeChannel) {
override def handleDelivery(consumerTag: String,
envelope: Envelope,
properties: BasicProperties,
body: Array[Byte]): Unit = {
queue.put(new AmqpMessage(envelope, new String(body)))
}
})
Future {
// this is different thread
val channel = connection.createChannel()
while (true) {
try {
val amqpMessage = queue.take()
channel.basicAck(amqpMessage.envelope.getDeliveryTag, false) // doesn't work
consumeChannel.basicAck(amqpMessage.envelope.getDeliveryTag, false) // works, but seems like not thread safe
} catch {
case e: Exception => e.printStackTrace()
}
}
}
Although the documentation is pretty restrictive, some operations on channels are safe to invoke concurrently.
You may ACK messages in the different thread as long as consuming and acking are the only actions you do on the channel.
See this SO question, which deals with the same thing:
RabbitMQ and channels Java thread safety
For me your solution is correct. You are not sharing channels across thread.
You never pass your channel object to another thread, you use it on the same thread that receives the messages.
It is not possible that you are
'acking some message and at the same time the channel is receiving another message from rabbit'
If your are in handleDelivery method, that thread is blocked by your code and has no chance of receiving another message.
As you found out, you cannot acknowledge message using channel other than channel that was used to receive message.
You must acknowledge using same channel, and you must do that on the same thread that was receiving message. So you may pass channel object to other methods, classes but you must be careful not to pass it to another thread.
I use this solution in my project It uses RabbitMQ listner and Spring Integration. For every AMQP message, one org.springframework.integration.Message is created. That message has AMPQ message body as payload, and AMQP channel and delivery tag as headers of my org.springframework.integration.Message.
If you want to acknowledge several messages, and they were delivered on the same channel, you should use
channel.basicAck(envelope.getDeliveryTag(), true);
For multiple channels, efficient algorithm is
Lets say you have 100 messages, delivered using 10 channels
you need to find max deliveryTag for each channel.
invoke channel.basicAck(maxDeliveryTagForThatChannel, true);
This way, you need 10 basicAck (network roundtrips) not 100.
As the docs say, one channel per thread the rest has no restrictions.
I would just to say a few things on your example. What you are trying to do here is wrong. There is no need to ACK the message only after you take it from ArrayBlockingQueue, because once you put it there, it stays there. ACKing it to RMQ has nothing to do with the other ArrayBlockingQueue queue.

Categories