Google PubSub Java (Scala) Client Gets Excessive Amount of Resent Messages

Google PubSub Java (Scala) Client Gets Excessive Amount of Resent Messages - java

I have a scenario where I load a subscription with around 1100 messages. I then start a Spark job which pulls messages from this subscription with these settings:
MaxOutstandingElementCount: 5
MaxAckExtensionPeriod: 60 min
AckDeadlineSeconds: 600
The first message to get processed starts a cache generation which takes about 30 minutes to complete. Any other messages arriving while this is going on are simply "returned" with no ack or nack. After that, a given message takes between 1 min and 30 mins to process. With an ack extension period of 60 min, I would never expect to see resending of messages.
The behaviour I am seeing is that while the initial cache is being generated, every 10 minutes 5 new messages are grabbed by the client and returned with no ack or nack by my code. This is unexpected. I would expect the deadline of the original 5 messages to be extended up to an hour.
Furthermore, after having processed and acked about 500 of the messages, I would expect around 600 left in the subscription, but I see almost the original 1100. These turn out to be resent duplicates, as I log these in my code. This is also very unexpected.
This is a screenshot from google console after around 500 messages have been processed and acked (ignore the first "hump", that was an aborted test run):
Am I missing something?
Here is the setup code:
val name = ProjectSubscriptionName.of(ConfigurationValues.ProjectId,
ConfigurationValues.PubSubSubscription)
val topic = ProjectTopicName.of(ConfigurationValues.ProjectId,
ConfigurationValues.PubSubSubscriptionTopic)
val pushConfig = PushConfig.newBuilder.build
val ackDeadlineSeconds = 600
subscriptionAdminClient.createSubscription(
name,
topic,
pushConfig,
ackDeadlineSeconds)
val flowControlSettings = FlowControlSettings.newBuilder()
.setMaxOutstandingElementCount(5L)
.build();
// create a subscriber bound to the asynchronous message receiver
val subscriber = Subscriber
.newBuilder(subscriptionName, new EtlMessageReceiver(spark))
.setFlowControlSettings(flowControlSettings)
.setMaxAckExtensionPeriod(Duration.ofMinutes(60))
.build
subscriber.startAsync.awaitRunning()
Here is the code in the receiver which runs when a message arrives while the cache is being generated:
if(!BIQConnector.cacheGenerationDone){
Utilities.logLine(
s"PubSub message for work item $uniqueWorkItemId ignored as cache is still being generated.")
return
}
And finally when a message has been processed:
consumer.ack()
Utilities.logLine(s"PubSub message ${message.getMessageId} for $tableName acknowledged.")
// Write back to ETL Manager
Utilities.logLine(
s"Writing result message back to topic ${etlResultTopic} for table $tableName, $tableDetailsForLog.")
sendPubSubResult(importTableName, validTableName, importTimestamp, 2, etlResultTopic, stageJobData,
tableDetailsForLog, "Success", isDeleted)

Is your Spark job using a Pub/Sub client library to pull messages? These libraries should indeed keep extending your message deadlines up to the MaxAckExtensionPeriod you specified.
If your job is using a Pub/Sub client library, this is unexpected behavior. You should contact Google Cloud support with your project name, subscription name, client library version, and a sample of the message IDs from the messages you are "returning" without acking. They will be able to investigate further into why you're receiving these resent messages.

Related

Ack pubSub message outside of the MessageReciever

I am using async Pull to pull messages from a pupSub topic, do some processing and send messages to ActiveMQ topic.
With the current configuration of pupSub I have to ack() the messages upon recieval. This however, does not suit my use case, as I need to ONLY ack() messages after they are successfully processed and sent to the other Topic. this means (per my understanding) ack()ing the messages outside the messageReciver.
I tried to save the each message and its AckReplyConsumer to be able to call it later and ack() the messages, this however does not work as expected. and not all messages are correctly ack() ed.
So I want to know if this is possible at all. and if Yes how
my subscriber configs
public Subscriber getSubscriber(CompositeConfigurationElement compositeConfigurationElement, Queue<CustomPupSubMessage> messages) throws IOException {
ProjectSubscriptionName subscriptionName = ProjectSubscriptionName.of(compositeConfigurationElement.getPubsub().getProjectid(),
compositeConfigurationElement.getSubscriber().getSubscriptionId());
ExecutorProvider executorProvider =
InstantiatingExecutorProvider.newBuilder().setExecutorThreadCount(2).build();
// Instantiate an asynchronous message receiver.
MessageReceiver receiver =
(PubsubMessage message, AckReplyConsumer consumer) -> {
messages.add(CustomPupSubMessage.builder().message(message).consumer(consumer).build());
};
// The subscriber will pause the message stream and stop receiving more messages from the
// server if any one of the conditions is met.
FlowControlSettings flowControlSettings =
FlowControlSettings.newBuilder()
// 1,000 outstanding messages. Must be >0. It controls the maximum number of messages
// the subscriber receives before pausing the message stream.
.setMaxOutstandingElementCount(compositeConfigurationElement.getSubscriber().getOutstandingElementCount())
// 100 MiB. Must be >0. It controls the maximum size of messages the subscriber
// receives before pausing the message stream.
.setMaxOutstandingRequestBytes(100L * 1024L * 1024L)
.build();
//read credentials
InputStream input = new FileInputStream(compositeConfigurationElement.getPubsub().getSecret());
CredentialsProvider credentialsProvider = FixedCredentialsProvider.create(ServiceAccountCredentials.fromStream(input));
Subscriber subscriber = Subscriber.newBuilder(subscriptionName, receiver)
.setParallelPullCount(compositeConfigurationElement.getSubscriber().getSubscriptionParallelThreads())
.setFlowControlSettings(flowControlSettings)
.setCredentialsProvider(credentialsProvider)
.setExecutorProvider(executorProvider)
.build();
return subscriber;
}
my processing part
jmsConnection.start();
for (int i = 0; i < patchSize; i++) {
var message = messages.poll();
if (message != null) {
byte[] payload = message.getMessage().getData().toByteArray();
jmsMessage = jmsSession.createBytesMessage();
jmsMessage.writeBytes(payload);
jmsMessage.setJMSMessageID(message.getMessage().getMessageId());
producer.send(jmsMessage);
list.add(message.getConsumer());
} else break;
}
jmsSession.commit();
jmsSession.close();
jmsConnection.close();
// if upload is successful then ack the messages
log.info("sent " + list.size() + " in direction " + dest);
list.forEach(consumer -> consumer.ack());

There is nothing that requires messages to be acked within the MessageReceiver callback and you should be able to acknowledge messages asynchronously. There are a few things to keep in mind and look for:
Check to ensure that you are calling ack before the ack deadline expires. By default, the Java client library does extend the ack deadline for up to 1 hour, so if you are taking less time than that to process, you should be okay.
If your subscriber is often flow controlled, consider reducing the value you pass into setParallelPullCount to 1. The flow control settings you pass in are passed to each stream, not divided among them, so if each stream is able to receive the full value passed in and your processing is slow enough, you could be exceeding the 1-hour deadline in the client library without having even received the message yet, causing the duplicate delivery. You really only need to use setParallelPullCount to a larger value if you are able to process messages much faster than a single stream can deliver them.
Ensure that your client library version is at least 1.109.0. There were some improvements made to the way flow control was done in that version.
Note that Pub/Sub has at-least-once delivery semantics, meaning messages can be redelivered, even if ack is called properly. Note that not acknowledging or nacking a single message could result in the redelivery of all messages that were published together in a single batch. See the "Message Redelivery & Duplication Rate
" section of "Fine-tuning Pub/Sub performance with batch and flow control settings."
If all of that still doesn't fix the issue, then it would be best to try to create a small, self-contained example that reproduces the issue and open up a bug in the GitHub repo.

Periods of prolonged inactivity and frequent MessageLockLostException in QueueClient

Background
We have a data transfer solution with Azure Service Bus as the message broker. We are transferring data from x datasets through x queues - with x dedicated QueueClients as senders. Some senders publish messages at the rate of one message every two seconds, while others publish one every 15 minutes.
The application on the data source side (where senders are) is working just fine, giving us the desired throughput.
On the other side, we have an application with one QueueClient receiver per queue with the following configuration:
maxConcurrentCalls = 1
autoComplete = true (if receive mode = RECEIVEANDDELETE) and false (if receive mode = PEEKLOCK) - we have some receivers where, if they shut-down unexpectedly, would want to preserve the messages in the Service Bus Queue.
maxAutoRenewDuration = 3 minutes (lock duraition on all queues = 30 seconds)
an Executor service with a single thread
The MessageHandler registered with each of these receivers does the following:
public CompletableFuture<Void> onMessageAsync(final IMessage message) {
// deserialize the message body
final CustomObject customObject = (CustomObject)SerializationUtils.deserialize((byte[])message.getMessageBody().getBinaryData().get(0));
// process processDB1() and processDB2() asynchronously
final List<CompletableFuture<Boolean>> processFutures = new ArrayList<CompletableFuture<Boolean>>();
processFutures.add(processDB1(customObject)); // processDB1() returns Boolean
processFutures.add(processDB2(customObject)); // processDB2() returns Boolean
// join both the completablefutures to get the result Booleans
List<Boolean> results = CompletableFuture.allOf(processFutures.toArray(new CompletableFuture[processFutures.size()])).thenApply(future -> processFutures.stream()
.map(CompletableFuture<Boolean>::join).collect(Collectors.toList())
if (results.contains(false)) {
// dead-letter the message if results contains false
return getQueueClient().deadLetterAsync(message.getLockToken());
} else {
// complete the message otherwise
getQueueClient().completeAsync(message.getLockToken());
}
}
We tested with the following scenarios:
Scenario 1 - receive mode = RECEIVEANDDELETE, message publish rate: 30/ minute
Expected Behavior
The messages should be received continuosuly with a constant throughput (which need not necessarily be the throughput at source, where messages are published).
Actual behavior
We observe random, long periods of inactivity from the QueueClient - ranging from minutes to hours - there is no Outgoing Messages from the Service Bus namespace (observed on the Metrics charts) and there are no consumption logs for the same time periods!
Scenario 2 - receive mode = PEEKLOCK, message publish rate: 30/ minute
Expected Behavior
The messages should be received continuosuly with a constant throughput (which need not necessarily be the throughput at source, where messages are published).
Actual behavior
We keep seeing MessageLockLostException constantly after 20-30 minutes into the run of the application.
We tried doing the following -
we reduced the prefetch count (from 20 * processing rate - as mentioned in the Best Practices guide) to a bare minimum (to even 0 in one test cycle), to reduce the no. of messages that are locked for the client
increased the maxAutoRenewDuration to 5 minutes - our processDB1() and processDB2() do not take more than a second or two for almost 90% of the cases - so, I think the lock duration of 30 seconds and maxAutoRenewDuration are not issues here.
removed the blocking CompletableFuture.get() and made the processing synchronous.
None of these tweaks helped us fix the issue. What we observed is that the COMPLETE or RENEWMESSAGELOCK are throwing the MessageLockLostException.
We need help with finding answers for the following:
why is there a long period of inactivity of the QueueClient in scenario 1?
how do we know the MessageLockLostExceptions are thrown, because the locks have indeed expired? we suspect the locks cannot expire too soon, as our processing happens in a second or two. disabling prefetch also did not solve this for us.
Versions and Service Bus details
Java - openjdk-11-jre
Azure Service Bus namespace tier: Standard
Java SDK version - 3.4.0

For Scenario 1 :
If you have the duplicate detection history enabled, there is a possibility of this behavior happening as per the below explained scenario :
I had enabled for 30 seconds. I constantly hit Service bus with duplicate messages ( im my case messages with the same messageid from the client - 30 /per minute). I would be seeing a no activity outgoing for the window. Though the messages are received at the servicebus from the sending client, I was not be able to see them in outgoing messages. You could probably check whether you re encountering the duplicate messages which are filtered - inturn resulting inactivity in outgoing.
Also Note : You can't enable/disable duplicate detection after the queue is created. You can only do so at the time of creating the queue.

The issue was not with the QueueClient object per se. It was with the processes that we were triggering from within the MessageHandler: processDB1(customObject) and processDB2(customObject). since these processes were not optimized, the message consumption dropped and the locks gor expired (in peek-lock mode), as the handler was spending more time (in relation to the rate at which messages were published to the queues) in completing these opertations.
After optimizing the processes, the consumption and completion (in peek-lock mode) were just fine.

Google Cloud Pub/Sub retries the message delivery if it does not get a response in 30 seconds. Intended?

I'm using a Java 8 servlet as a Cloud Pub/Sub push endpoint.
On my push endpoint I have a long-running blocking operation, that sometimes runs for over a minute.
After the operation is done, I return a 200 response, acking the message.
If I return a 500 server error, the message is retried, which is expected.
Note that I create my subscription with a maximum allowed deadline ack period of 600 seconds.
What I have noticed is that if my long-running operation runs for over 30 seconds, the message is also retried. Seems like the HTTP connection that is used for push delivery does not live for over 30 seconds or something.
Is this intended? Is it configurable somehow? Thanks in advance.

For push subscriptions, Cloud Pub/Sub does not send a negative acknowledgment (sometimes known as a nack). If your webhook does not return a success code within the acknowledgment deadline, Cloud Pub/Sub retries delivery until the message expires after the subscription's message retention period. You can configure a default acknowledgment deadline for push subscriptions when you create the push subscription (select push subscription and set Acknowledgement deadline).
Note that, unlike for pull subscriptions, the deadline cannot be extended for individual messages. The deadline is effectively the amount of time the endpoint has to respond to the push request.

Yes. It is expected.
Pubsub guarantee at-least one time delivery until you acknowledge the message.
You can add delay in Push subscription by adding setting in Subsccription.
Go To subscription -> Edit ->Acknowledgement deadline ->
set values from 10 Seconds to 600 Seconds(10 Minutes).

Get messages that was issued before MQTT subscriber was started [duplicate]

I'm new in MQTT there is a simple range of numbers which I want to print I have created 2 files in which the 1st file whose send data to the 2nd file and the script is like that:
sender.py
import paho.mqtt.client as mqtt
client = mqtt.Client()
client.connect("192.168.1.169", 1883, 60)
for i in range(1,100):
client.publish("TestTopic", i)
print(i)
client.disconnect()
receiver.py:
import paho.mqtt.client as mqtt
def on_connect(client, userdata, flags, rc):
print("Connected with result code "+str(rc))
client.subscribe("house/bulbs/bulb1")
def on_message(client, userdata, msg):
# print(msg.topic+" "+str(msg.payload))
print("message received ", str(msg.payload.decode("utf-8")))
print("message topic=", msg.topic)
print("message qos=", msg.qos)
print("message retain flag=", msg.retain)
client = mqtt.Client()
client.on_connect = on_connect
client.on_message = on_message
client.connect("192.168.1.169", 1883, 60)
client.loop_forever()
I'm able to print the data if the receiver file is active but I have a problem in printing it if I started the sender file and then I started the receiver file ,main question is does MQTT follows the queueing Mechanism or not if yes then ....if I'm running the sender file then its all data should be in queue and after that when I'm run the other file which is receiver then I should get printed.. but its not working in the same way please help me I went lots of documents but i'm able to find any relevant info.. recently I found clean_session if someone have knowledge about this please tell me ....have any questions related my code or anything please let me know
thanks

MQTT is a pub/sub protocol, not a message queuing system.
This means under normal circumstances if there is no subscriber running when a message is published then it will not be delivered.
It is possible to get the broker to queue messages for a specific subscriber, but this requires the subscriber to have been connected before the message is published and to have subscribed with a QOS of greater than 0. Then as long as it reconnects with the clean session flag set to false and the same client id after the publish then the broker will deliver the missed messages.
Retained messages are something different. If a message is published with the retained flag set to true then the broker will deliver this single message to every subscriber when they subscribe to the matching topic. There can only ever be 1 retained message for a given topic.

GCP Pubsub high latency on low message/sec

I'm publishing Pubsub messages from AppEngine Flexible environment with the JAVA client library like this:
Publisher publisher = Publisher
.newBuilder(ProjectTopicName.of(Utils.getApplicationId(), "test-topic"))
.setBatchingSettings(
BatchingSettings.newBuilder()
.setIsEnabled(false)
.build())
.build();
publisher.publish(PubsubMessage.newBuilder()
.setData(ByteString.copyFromUtf8(message))
.putAttributes("timestamp", String.valueOf(System.currentTimeMillis()))
.build());
I'm subscribing to the topic in Dataflow and logging how long it takes for the message to reach Dataflow from AppEngine flexible
pipeline
.apply(PubsubIO.readMessagesWithAttributes().fromSubscription(Utils.buildPubsubSubscription(Constants.PROJECT_NAME, "test-topic")))
.apply(ParDo.of(new DoFn<PubsubMessage, PubsubMessage>() {
#ProcessElement
public void processElement(ProcessContext c) {
long timestamp = System.currentTimeMillis() - Long.parseLong(c.element().getAttribute("timestamp"));
System.out.println("Time: " + timestamp);
}
}));
pipeline.run();
When I'm publishing messages at the rate of a few messages per second then the logs show that the time needed for the message to reach Dataflow is between 100ms and 1.5 seconds.
But when the rate is about 100 messages per second then the time is constantly between 100ms - 200ms, which seems totally adequate.
Can someone explain this behavior. It seems as turning off the publisher batching does not work.

Pub/Sub is designed for high throughput messages for both Subscription cases.
Pull subscription works best when there's large volume of messages, it's the kind of subscription you would use when throughput of message processing is your priority. Specially note that synchronous pull doesn't handle messages as soon as they are published, and can choose to pull and handle a fixed number of messages (more messages, more pulls). A better option would be to use asynchronous pull, which uses a long running message listener and acknowledges one message at a time [1].
On the other hand, Push subscription uses a slow-start algorithm: The number of messages sent is doubled with each successful delivery until it reaches its constraints (more messages, more -and faster- deliveries).
[1] https://cloud.google.com/pubsub/docs/pull#asynchronous-pull

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.