Ack pubSub message outside of the MessageReciever - java

I am using async Pull to pull messages from a pupSub topic, do some processing and send messages to ActiveMQ topic.
With the current configuration of pupSub I have to ack() the messages upon recieval. This however, does not suit my use case, as I need to ONLY ack() messages after they are successfully processed and sent to the other Topic. this means (per my understanding) ack()ing the messages outside the messageReciver.
I tried to save the each message and its AckReplyConsumer to be able to call it later and ack() the messages, this however does not work as expected. and not all messages are correctly ack() ed.
So I want to know if this is possible at all. and if Yes how
my subscriber configs
public Subscriber getSubscriber(CompositeConfigurationElement compositeConfigurationElement, Queue<CustomPupSubMessage> messages) throws IOException {
ProjectSubscriptionName subscriptionName = ProjectSubscriptionName.of(compositeConfigurationElement.getPubsub().getProjectid(),
compositeConfigurationElement.getSubscriber().getSubscriptionId());
ExecutorProvider executorProvider =
InstantiatingExecutorProvider.newBuilder().setExecutorThreadCount(2).build();
// Instantiate an asynchronous message receiver.
MessageReceiver receiver =
(PubsubMessage message, AckReplyConsumer consumer) -> {
messages.add(CustomPupSubMessage.builder().message(message).consumer(consumer).build());
};
// The subscriber will pause the message stream and stop receiving more messages from the
// server if any one of the conditions is met.
FlowControlSettings flowControlSettings =
FlowControlSettings.newBuilder()
// 1,000 outstanding messages. Must be >0. It controls the maximum number of messages
// the subscriber receives before pausing the message stream.
.setMaxOutstandingElementCount(compositeConfigurationElement.getSubscriber().getOutstandingElementCount())
// 100 MiB. Must be >0. It controls the maximum size of messages the subscriber
// receives before pausing the message stream.
.setMaxOutstandingRequestBytes(100L * 1024L * 1024L)
.build();
//read credentials
InputStream input = new FileInputStream(compositeConfigurationElement.getPubsub().getSecret());
CredentialsProvider credentialsProvider = FixedCredentialsProvider.create(ServiceAccountCredentials.fromStream(input));
Subscriber subscriber = Subscriber.newBuilder(subscriptionName, receiver)
.setParallelPullCount(compositeConfigurationElement.getSubscriber().getSubscriptionParallelThreads())
.setFlowControlSettings(flowControlSettings)
.setCredentialsProvider(credentialsProvider)
.setExecutorProvider(executorProvider)
.build();
return subscriber;
}
my processing part
jmsConnection.start();
for (int i = 0; i < patchSize; i++) {
var message = messages.poll();
if (message != null) {
byte[] payload = message.getMessage().getData().toByteArray();
jmsMessage = jmsSession.createBytesMessage();
jmsMessage.writeBytes(payload);
jmsMessage.setJMSMessageID(message.getMessage().getMessageId());
producer.send(jmsMessage);
list.add(message.getConsumer());
} else break;
}
jmsSession.commit();
jmsSession.close();
jmsConnection.close();
// if upload is successful then ack the messages
log.info("sent " + list.size() + " in direction " + dest);
list.forEach(consumer -> consumer.ack());

There is nothing that requires messages to be acked within the MessageReceiver callback and you should be able to acknowledge messages asynchronously. There are a few things to keep in mind and look for:
Check to ensure that you are calling ack before the ack deadline expires. By default, the Java client library does extend the ack deadline for up to 1 hour, so if you are taking less time than that to process, you should be okay.
If your subscriber is often flow controlled, consider reducing the value you pass into setParallelPullCount to 1. The flow control settings you pass in are passed to each stream, not divided among them, so if each stream is able to receive the full value passed in and your processing is slow enough, you could be exceeding the 1-hour deadline in the client library without having even received the message yet, causing the duplicate delivery. You really only need to use setParallelPullCount to a larger value if you are able to process messages much faster than a single stream can deliver them.
Ensure that your client library version is at least 1.109.0. There were some improvements made to the way flow control was done in that version.
Note that Pub/Sub has at-least-once delivery semantics, meaning messages can be redelivered, even if ack is called properly. Note that not acknowledging or nacking a single message could result in the redelivery of all messages that were published together in a single batch. See the "Message Redelivery & Duplication Rate
" section of "Fine-tuning Pub/Sub performance with batch and flow control settings."
If all of that still doesn't fix the issue, then it would be best to try to create a small, self-contained example that reproduces the issue and open up a bug in the GitHub repo.

Related

Message transfer in between two topics in google cloud pub sub

We have a use case where on any action from UI we need to read messages from google pub/sub Topic A synchronously and move those messages to Topic B.
Below is the code that has been written to handle this behavior and this is from Google Pub Sub docs to access a Topic synchronusly.
public static int subscribeSync(String projectId, String subscriptionId, Integer numOfMessages, int count, String acknowledgementTopic) throws IOException {
SubscriberStubSettings subscriberStubSettings =
SubscriberStubSettings.newBuilder()
.setTransportChannelProvider(
SubscriberStubSettings.defaultGrpcTransportProviderBuilder()
.setMaxInboundMessageSize(20 * 1024 * 1024) // 20MB (maximum message size).
.build())
.build();
try (SubscriberStub subscriber = GrpcSubscriberStub.create(subscriberStubSettings)) {
String subscriptionName = ProjectSubscriptionName.format(projectId, subscriptionId);
PullRequest pullRequest =
PullRequest.newBuilder()
.setMaxMessages(numOfMessages)
.setSubscription(subscriptionName)
.build();
// Use pullCallable().futureCall to asynchronously perform this operation.
PullResponse pullResponse = subscriber.pullCallable().call(pullRequest);
List<String> ackIds = new ArrayList<>();
for (ReceivedMessage message : pullResponse.getReceivedMessagesList()) {
// START - CODE TO PUBLISH MESSAGE TO TOPIC B
**publishMessage(message.getMessage(),acknowledgementTopic,projectId);**
// END - CODE TO PUBLISH MESSAGE TO TOPIC B
ackIds.add(message.getAckId());
}
// Acknowledge received messages.
AcknowledgeRequest acknowledgeRequest =
AcknowledgeRequest.newBuilder()
.setSubscription(subscriptionName)
.addAllAckIds(ackIds)
.build();
// Use acknowledgeCallable().futureCall to asynchronously perform this operation.
subscriber.acknowledgeCallable().call(acknowledgeRequest);
count=pullResponse.getReceivedMessagesList().size();
}catch(Exception e) {
log.error(e.getMessage());
}
return count;
}
Below is the sample code to publish messages to Topic B
public static void publishMessage(PubsubMessage pubsubMessage,String Topic,String projectId) {
Publisher publisher = null;
ProjectTopicName topicName =ProjectTopicName.newBuilder().setProject(projectId).setTopic(Topic).build();
try {
// Publish the messages to normal topic.
publisher = Publisher.newBuilder(topicName).build();
} catch (IOException e) {
log.error(e.getMessage());
}
publisher.publish(pubsubMessage);
}
Is this the right way of handling this use case or this can be handled in someother way. We do not want to use Cloud Dataflow. Can someone let us know if this is fine or there is an issue.
The code works but sometimes messages stay on Topic A even after hey are consumed synchronously.
Thanks'
There are some issues with the code as presented.
You should really only use synchronous pull if there are specific reasons why you need to do so. In general, it is much better to use asynchronous pull via the client libraries. It will be more efficient and reduce the latency of moving messages from one topic to the other. You do not show how you call subscribeSync, but in order to process messages efficiently and ensure that you actually process all messages, you'd need to be calling it many times in parallel continuously. If you are going to stick with synchronous pull, then you should reuse the SubscriberStub object as recreating it for every call will be inefficient.
You don't reuse your Publisher object. As a result, you are not able to take advantage of the batching that the publisher client can do. You should create the Publisher once and reuse it across your calls for publishes to the same topic. If the passed-in topic can differ across messages, then keep a map from topic to publisher and retrieve the right one from the map.
You don't wait for the result of the call to publish. It is possible that this call fails, but you do not handle that failure. As a result, you could acknowledge the message on the first topic without it having actually been published, resulting in message loss.
With regard to your question about duplicates, Pub/Sub offers at-least-once delivery guarantees, so even with proper acking, it is still possible to receive messages again (typical duplicate rates are around 0.1%). There can be many different reasons for duplicates. In your case, since you are processing messages sequentially and recreating a publisher for every call, it could be that later messages are not acked before the ack deadline expires, which results in redelivery.

Subscribe to Google Pub/Sub Topic for X seconds and stop if no messages are received

I'm using the Google Pub/Sub Java SDK to subscribe to a topic. What I want to do is the following:
Start listening to a topic for X seconds (let's assume 25 seconds)
If a message is received then stop listening and process the message (this can take a few minutes)
After processing the message continue listening for a topic again for 25 seconds
If no message is received within 25 seconds then stop definitively listening
I can't seem to find anything in the documentation and only. Maybe it's just not possible?
Here's how I start the subscriber:
// Create a subscriber bound to the asynchronous message receiver
subscriber = Subscriber.newBuilder(projectSubscriptionName, new PubSubRoeMessageReceiver()).build();
// Start subscriber
subscriber.startAsync().awaitRunning();
// Allow the subscriber to run indefinitely unless an unrecoverable error occurs.
subscriber.awaitTerminated();
And this is what my message receiver looks like:
public class PubSubRoeMessageReceiver implements MessageReceiver {
#Override
public void receiveMessage(PubsubMessage pubsubMessage, AckReplyConsumer ackReplyConsumer) {
// Acknowledge message
System.out.println("Acknowledge message");
ackReplyConsumer.ack();
// TODO: stop the subscriber
// TODO: run task X
// TODO: start the subscriber
}
}
Any ideas?
Using Cloud Pub/Sub in this way is an anti-pattern and would cause issues. If you immediately ack the message after you receive it, but before you process it, what do you do if the subscriber crashes for some reason? Pub/Sub won't redeliver the message and therefore may never process it potentially.
Therefore, you probably want to wait to ack until after the message is processed. But then, you wouldn't be able to shut down the subscriber because the fact that the message is outstanding would be lost and therefore, the ack deadline would expire and the message would get redelivered.
If you want to ensure the client only receives one message at a time, you could use the FlowControlSettings on the client. If you set MaxOutstandingElementCount to 1, then only one message will be delivered to receiveMessage at a time:
subscriber = Subscriber.newBuilder(projectSubscriptionName, new PubSubRoeMessageReceiver())
.setFlowControlSettings(FlowControlSettings.newBuilder()
.setMaxOutstandingRequestBytes(10L * 1024L * 1024L) // 10MB messages allowed.
.setMaxOutstandingElementCount(1L) // Only 1 outstanding message at a time.
.build())
.build();
Keep in mind that if you have a large backlog of small messages at the time you start up the subscriber and you intend to start up multiple subscribers, you may run into inefficient load balancing as explained in the documentation.

JMS send same message back to SQS

I am working on an approach where i am required to send a message back to SQS.
I don't want it to go as a new message as that will reset the approximateRecieveCount parameter which is required by the code.
Please note that
I cannot send a NACK to the queue as i am reading it as a batch of 10 messages, I want to manually post it back in certain cases for individual message and not as a batch.
The code I am trying to use
I tried setting the JMSMessageId but it is not possible as according to the documentation -
After you send messages, Amazon SQS sets the following headers and properties for each message:
JMSMessageID
JMS_SQS_SequenceNumber (only for FIFO queues)
The code i am using right now is
defaultJmsTemplate.send(destinationName, new MessageCreator() {
#Override
public Message createMessage(Session session) throws JMSException {
Message message = session.createTextMessage(errorMessage);
message.setJMSCorrelationID(transactionId);
if (destinationName.endsWith(".fifo")) {
message.setStringProperty("JMSXGroupID", property.getMessageGroup());
message.setStringProperty("JMS_SQS_DeduplicationId", java.util.UUID.randomUUID().toString());
}
return message;
}
});
}
Is there anything that i can set/use to make sure the message is not treated as a new message and the approximate receive count is maintained?
Yes. This can be done. As you are using JMS for SQS while setting up your consumer you can define an UNORDERED_ACKNOWLEDGE mode in your consumer session. By doing so if you do not acknowledge a particular message it will be redelivered after its visibility timeout expires and the approximateRecieveCount will be incremented. This will not impact your other messages in the same batch. One downside of this is if you are using the fifo queue and the all your messages have same group id then you next message will only be processed after this unacknowledged message ends up in dead letter queue. This will only happen after your message is retried for the Maximum Receives that you have set up in fifo queue configuration. Note : The key here is to not acknowledge a particular message.

Making sure a message published on a topic exchange is received by at least one consumer

TLDR; In the context of a topic exchange and queues created on the fly by the consumers, how to have a message redelivered / the producer notified when no consumer consumes the message?
I have the following components:
a main service, producing files. Each file has a certain category (e.g. pictures.profile, pictures.gallery)
a set of workers, consuming files and producing a textual output from them (e.g. the size of the file)
I currently have a single RabbitMQ topic exchange.
The producer sends messages to the exchange with routing_key = file_category.
Each consumer creates a queue and binds the exchange to this queue for a set of routing keys (e.g. pictures.* and videos.trending).
When a consumer has processed a file, it pushes the result in a processing_results queue.
Now - this works properly, but it still has a major issue. Currently, if the publisher sends a message with a routing key that no consumer is bound to, the message will be lost. This is because even if the queue created by the consumers is durable, it is destroyed as soon as the consumer disconnects since it is unique to this consumer.
Consumer code (python):
channel.exchange_declare(exchange=exchange_name, type='topic', durable=True)
result = channel.queue_declare(exclusive = True, durable=True)
queue_name = result.method.queue
topics = [ "pictures.*", "videos.trending" ]
for topic in topics:
channel.queue_bind(exchange=exchange_name, queue=queue_name, routing_key=topic)
channel.basic_consume(my_handler, queue=queue_name)
channel.start_consuming()
Loosing a message in this condition is not acceptable in my use case.
Attempted solution
However, "loosing" a message becomes acceptable if the producer is notified that no consumer received the message (in this case it can just resend it later). I figured out the mandatory field could help, since the specification of AMQP states:
This flag tells the server how to react if the message cannot be routed to a queue. If this flag is set, the server will return an unroutable message with a Return method.
This is indeed working - in the producer, I am able to register a ReturnListener :
rabbitMq.confirmSelect();
rabbitMq.addReturnListener( (int replyCode, String replyText, String exchange, String routingKey, AMQP.BasicProperties properties, byte[] body) -> {
log.info("A message was returned by the broker");
});
rabbitMq.basicPublish(exchangeName, "pictures.profile", true /* mandatory */, MessageProperties.PERSISTENT_TEXT_PLAIN, messageBytes);
This will as expected print A message was returned by the broker if a message is sent with a routing key no consumer is bound to.
Now, I also want to know when the message was correctly received by a consumer. So I tried registering a ConfirmListener as well:
rabbitMq.addConfirmListener(new ConfirmListener() {
void handleAck(long deliveryTag, boolean multiple) throws IOException {
log.info("ACK message {}, multiple = ", deliveryTag, multiple);
}
void handleNack(long deliveryTag, boolean multiple) throws IOException {
log.info("NACK message {}, multiple = ", deliveryTag, multiple);
}
});
The issue here is that the ACK is sent by the broker, not by the consumer itself. So when the producer sends a message with a routing key K:
If a consumer is bound to this routing key, the broker just sends an ACK
Otherwise, the broker sends a basic.return followed by a ACK
Cf the docs:
For unroutable messages, the broker will issue a confirm once the exchange verifies a message won't route to any queue (returns an empty list of queues). If the message is also published as mandatory, the basic.return is sent to the client before basic.ack. The same is true for negative acknowledgements (basic.nack).
So while my problem is theoretically solvable using this, it would make the logic of knowing if a message was correctly consumed very complicated (especially in the context of multi threading, persistence in a database, etc.):
send a message
on receive ACK:
if no basic.return was received for this message
the message was correctly consumed
else
the message wasn't correctly consumed
on receive basic.return
the message wasn't correctly consumed
Possible other solutions
Have a queue for each file category, i.e. the queues pictures_profile, pictures_gallery, etc. Not good since it removes a lot of flexibility for the consumers
Have a "response timeout" logic in the producer. The producer sends a message. It expects an "answer" for this message in the processing_results queue. A solution would be to resend the message if it hasn't been answered to after X seconds. I don't like it though, it would create some additional tricky logic in the producer.
Produce the messages with a TTL of 0, and have the producer listen on a dead-letter exchange. This is the official suggested solution to replace the 'immediate' flag removed in RabbitMQ 3.0 (see paragraph Removal of "immediate" flag). According to the docs of the dead letter exchanges, a dead letter exchange can only be configured per-queue. So it wouldn't work here
[edit] A last solution I see is to have every consumer create a durable queue that isn't destroyed when he disconnects, and have it listen on it. Example: consumer1 creates queue-consumer-1 that is bound to the message of myExchange having a routing key abcd. The issue I foresee is that it implies to find an unique identifier for every consumer application instance (e.g. hostname of the machine it runs on).
I would love to have some inputs on that - thanks!
Related to:
RabbitMQ: persistent message with Topic exchange (not applicable here since queues are created "on the fly")
Make sure the broker holds messages until at least one consumer gets it
RabbitMQ Topic Exchange with persisted queue
[edit] Solution
I ended up implementing something that uses a basic.return, as mentioned earlier. It is actually not so tricky to implement, you just have to make sure that your method producing the messages and the method handling the basic returns are synchronized (or have a shared lock if not in the same class), otherwise you can end up with interleaved execution flows that will mess up your business logic.
I believe that an alternate exchange would be the best fit for your use case for the part regarding the identification of not routed messages.
Whenever an exchange with a configured AE cannot route a message to any queue, it publishes the message to the specified AE instead.
Basically upon creation of the "main" exchange, you configure an alternate exchange for it.
For the referenced alternate exchange, I tend to go with a fanout, then create a queue (notroutedq) binded to it.
This means any message that is not published to at least one of the queues bound to your "main" exchange will end up in the notroutedq
Now regarding your statement:
because even if the queue created by the consumers is durable, it is destroyed as soon as the consumer disconnects since it is unique to this consumer.
Seems that you have configured your queues with auto-delete set to true.
If so, in case of disconnect, as you stated, the queue is destroyed and the messages still present on the queue are lost, case not covered by the alternate exchange configuration.
It's not clear from your use case description whether you'd expect in some cases for a message to end up in more than one queue, seemed more a case of one queue per type of processing expected (while keeping the grouping flexible). If indeed the queue split is related to type of processing, I do not see the benefit of setting the queue with auto-delete, expect maybe not having to do any cleanup maintenance when you want to change the bindings.
Assuming you can go with durable queues, then a dead letter exchange (would again go with fanout) with a binding to a dlq would cover the missing cases.
not routed covered by alternate exchange
correct processing already handled by your processing_result queue
problematic processing or too long to be processed covered by the dead letter exchange, in which case the additional headers added upon dead lettering the message can even help to identify the type of actions to take

RabbitMQ. Java client. Is it possible to acknowledge message not on the same thread it was received?

I want to fetch several messages, handle them and ack them all together after that. So basically I receive a message, put it in some queue and continue receiving messages from rabbit. Different thread will monitor this queue with received messages and process them when amount is sufficient. All I've been able to found about ack contains examples only for one message which processed on the same thread. Like this(from official docs):
channel.basicQos(1);
final Consumer consumer = new DefaultConsumer(channel) {
#Override
public void handleDelivery(String consumerTag, Envelope envelope, AMQP.BasicProperties properties, byte[] body) throws IOException {
String message = new String(body, "UTF-8");
System.out.println(" [x] Received '" + message + "'");
try {
doWork(message);
} finally {
System.out.println(" [x] Done");
channel.basicAck(envelope.getDeliveryTag(), false);
}
}
};
And also documentation says this:
Channel instances must not be shared between threads. Applications
should prefer using a Channel per thread instead of sharing the same
Channel across multiple threads. While some operations on channels are
safe to invoke concurrently, some are not and will result in incorrect
frame interleaving on the wire.
So I'm confused here. If I'm acking some message and at the same time the channel is receiving another message from rabbit, is it considered to be two operations at the time? It seems to me like yes.
I've tried to acknowledge message on the same channel from different thread and it seems to work, but documentation says that I should not share channels between threads. So I've tried to do acknowledgment on different thread with different channel, but it fails, because delivery tag is unknown for this channel.
Is it possible to acknowledge message not on the same thread it was received?
UPD
Example piece of code of what I want. It's in scala, but I think it's straightforward.
case class AmqpMessage(envelope: Envelope, msgBody: String)
val queue = new ArrayBlockingQueue[AmqpMessage](100)
val consumeChannel = connection.createChannel()
consumeChannel.queueDeclare(queueName, true, false, true, null)
consumeChannel.basicConsume(queueName, false, new DefaultConsumer(consumeChannel) {
override def handleDelivery(consumerTag: String,
envelope: Envelope,
properties: BasicProperties,
body: Array[Byte]): Unit = {
queue.put(new AmqpMessage(envelope, new String(body)))
}
})
Future {
// this is different thread
val channel = connection.createChannel()
while (true) {
try {
val amqpMessage = queue.take()
channel.basicAck(amqpMessage.envelope.getDeliveryTag, false) // doesn't work
consumeChannel.basicAck(amqpMessage.envelope.getDeliveryTag, false) // works, but seems like not thread safe
} catch {
case e: Exception => e.printStackTrace()
}
}
}
Although the documentation is pretty restrictive, some operations on channels are safe to invoke concurrently.
You may ACK messages in the different thread as long as consuming and acking are the only actions you do on the channel.
See this SO question, which deals with the same thing:
RabbitMQ and channels Java thread safety
For me your solution is correct. You are not sharing channels across thread.
You never pass your channel object to another thread, you use it on the same thread that receives the messages.
It is not possible that you are
'acking some message and at the same time the channel is receiving another message from rabbit'
If your are in handleDelivery method, that thread is blocked by your code and has no chance of receiving another message.
As you found out, you cannot acknowledge message using channel other than channel that was used to receive message.
You must acknowledge using same channel, and you must do that on the same thread that was receiving message. So you may pass channel object to other methods, classes but you must be careful not to pass it to another thread.
I use this solution in my project It uses RabbitMQ listner and Spring Integration. For every AMQP message, one org.springframework.integration.Message is created. That message has AMPQ message body as payload, and AMQP channel and delivery tag as headers of my org.springframework.integration.Message.
If you want to acknowledge several messages, and they were delivered on the same channel, you should use
channel.basicAck(envelope.getDeliveryTag(), true);
For multiple channels, efficient algorithm is
Lets say you have 100 messages, delivered using 10 channels
you need to find max deliveryTag for each channel.
invoke channel.basicAck(maxDeliveryTagForThatChannel, true);
This way, you need 10 basicAck (network roundtrips) not 100.
As the docs say, one channel per thread the rest has no restrictions.
I would just to say a few things on your example. What you are trying to do here is wrong. There is no need to ACK the message only after you take it from ArrayBlockingQueue, because once you put it there, it stays there. ACKing it to RMQ has nothing to do with the other ArrayBlockingQueue queue.

Categories