Draining a job with processing_time timer - java

I'm working with a dataflow job with stateful processing and timer. The processus is simplified as below :
Receiving messages from PubSub Subscription.
Keeping documents into bagState.
Checking with a loop timer (processing_time) if all conditions are met
if ok, clear bagState and generate a new message to next step.
Convert and send message to PubSub Topic.
Don't have set particular windowing policies, so I'm using GlobalWindow (as I understood).
When draining is performed (with continuously incoming message - 1k/sec - don't know if it could be related), the job raise this exception :
Error message from worker: java.lang.IllegalArgumentException: Attempted to set a processing-time timer with an output timestamp of 294247-01-10T04:00:54.775Z that is after the expiration of window 294247-01-09T04:00:54.775Z
with related stacktrace :
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$TimerInternalsTimer.setAndVerifyOutputTimestamp(SimpleDoFnRunner.java:1229)
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$TimerInternalsTimer.setRelative(SimpleDoFnRunner.java:1138)
xxx.xxxxx.xxxxxxxxx.transform.AggregateLogsAuditFn.onLoopIteration(AggregateLogsAuditFn.java:292)
This error handles when resetting loop timer in #ProcessElements method (or whatever in #OnTimer) :
loopTimer.offset(Duration.standardSeconds(loopTimerSec.get())).setRelative();
The timer (and it's value) are declared as :
#TimerId("loopTimer") private final TimerSpec loopTimer = TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
private ValueProvider<Integer> latenessTimerSec;
As an exception is raised, the job won't stop properly ; and we need to cancel it.
Please note that updating the job (with --update) is working fine, and this exception never appears when the job is running normally.
Thanks for your advices, Lionel.

Draining advances the watermark to "the end of time" (aka 294247-01-10) and the error seems to indicate that a timer is being set to shortly after this. To make th loop timer compatible with drain you should avoid setting it if the current time is this high.

Related

Debezium Interrupted while emitting initial DROP TABLE events

I'm trying to set up Debezium engine with MariaDB and ActiveMQ. I'm using Quarkus framework. I'm following the official documentation (https://debezium.io/documentation/reference/development/engine.html). When I start the engine I get the following error:
2021-05-03 10:05:53,184 INFO [io.deb.pip.sou.AbstractSnapshotChangeEventSource] (debezium-mysqlconnector-my-app-connector-change-event-source-coordinator) Snapshot - Final stage
2021-05-03 10:05:53,184 WARN [io.deb.pip.ChangeEventSourceCoordinator] (debezium-mysqlconnector-my-app-connector-change-event-source-coordinator) Change event source executor was interrupted: java.lang.InterruptedException: Interrupted while emitting initial DROP TABLE events
Not really sure why this happens and so far I've not been able to track down the source of the problem so any kind of help will be appreciated.
I was able to resolve this by deleting the file configured with the property offset.storage.file.filename.
// Run the engine asynchronously ...
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.execute(engine);
// Do something else or wait for a signal or an event
Make sure DO wait something, or the connector thread will be terminated by the main thread, and you will get a message like "Snapshot was interrupted before completion".

Periods of prolonged inactivity and frequent MessageLockLostException in QueueClient

Background
We have a data transfer solution with Azure Service Bus as the message broker. We are transferring data from x datasets through x queues - with x dedicated QueueClients as senders. Some senders publish messages at the rate of one message every two seconds, while others publish one every 15 minutes.
The application on the data source side (where senders are) is working just fine, giving us the desired throughput.
On the other side, we have an application with one QueueClient receiver per queue with the following configuration:
maxConcurrentCalls = 1
autoComplete = true (if receive mode = RECEIVEANDDELETE) and false (if receive mode = PEEKLOCK) - we have some receivers where, if they shut-down unexpectedly, would want to preserve the messages in the Service Bus Queue.
maxAutoRenewDuration = 3 minutes (lock duraition on all queues = 30 seconds)
an Executor service with a single thread
The MessageHandler registered with each of these receivers does the following:
public CompletableFuture<Void> onMessageAsync(final IMessage message) {
// deserialize the message body
final CustomObject customObject = (CustomObject)SerializationUtils.deserialize((byte[])message.getMessageBody().getBinaryData().get(0));
// process processDB1() and processDB2() asynchronously
final List<CompletableFuture<Boolean>> processFutures = new ArrayList<CompletableFuture<Boolean>>();
processFutures.add(processDB1(customObject)); // processDB1() returns Boolean
processFutures.add(processDB2(customObject)); // processDB2() returns Boolean
// join both the completablefutures to get the result Booleans
List<Boolean> results = CompletableFuture.allOf(processFutures.toArray(new CompletableFuture[processFutures.size()])).thenApply(future -> processFutures.stream()
.map(CompletableFuture<Boolean>::join).collect(Collectors.toList())
if (results.contains(false)) {
// dead-letter the message if results contains false
return getQueueClient().deadLetterAsync(message.getLockToken());
} else {
// complete the message otherwise
getQueueClient().completeAsync(message.getLockToken());
}
}
We tested with the following scenarios:
Scenario 1 - receive mode = RECEIVEANDDELETE, message publish rate: 30/ minute
Expected Behavior
The messages should be received continuosuly with a constant throughput (which need not necessarily be the throughput at source, where messages are published).
Actual behavior
We observe random, long periods of inactivity from the QueueClient - ranging from minutes to hours - there is no Outgoing Messages from the Service Bus namespace (observed on the Metrics charts) and there are no consumption logs for the same time periods!
Scenario 2 - receive mode = PEEKLOCK, message publish rate: 30/ minute
Expected Behavior
The messages should be received continuosuly with a constant throughput (which need not necessarily be the throughput at source, where messages are published).
Actual behavior
We keep seeing MessageLockLostException constantly after 20-30 minutes into the run of the application.
We tried doing the following -
we reduced the prefetch count (from 20 * processing rate - as mentioned in the Best Practices guide) to a bare minimum (to even 0 in one test cycle), to reduce the no. of messages that are locked for the client
increased the maxAutoRenewDuration to 5 minutes - our processDB1() and processDB2() do not take more than a second or two for almost 90% of the cases - so, I think the lock duration of 30 seconds and maxAutoRenewDuration are not issues here.
removed the blocking CompletableFuture.get() and made the processing synchronous.
None of these tweaks helped us fix the issue. What we observed is that the COMPLETE or RENEWMESSAGELOCK are throwing the MessageLockLostException.
We need help with finding answers for the following:
why is there a long period of inactivity of the QueueClient in scenario 1?
how do we know the MessageLockLostExceptions are thrown, because the locks have indeed expired? we suspect the locks cannot expire too soon, as our processing happens in a second or two. disabling prefetch also did not solve this for us.
Versions and Service Bus details
Java - openjdk-11-jre
Azure Service Bus namespace tier: Standard
Java SDK version - 3.4.0
For Scenario 1 :
If you have the duplicate detection history enabled, there is a possibility of this behavior happening as per the below explained scenario :
I had enabled for 30 seconds. I constantly hit Service bus with duplicate messages ( im my case messages with the same messageid from the client - 30 /per minute). I would be seeing a no activity outgoing for the window. Though the messages are received at the servicebus from the sending client, I was not be able to see them in outgoing messages. You could probably check whether you re encountering the duplicate messages which are filtered - inturn resulting inactivity in outgoing.
Also Note : You can't enable/disable duplicate detection after the queue is created. You can only do so at the time of creating the queue.
The issue was not with the QueueClient object per se. It was with the processes that we were triggering from within the MessageHandler: processDB1(customObject) and processDB2(customObject). since these processes were not optimized, the message consumption dropped and the locks gor expired (in peek-lock mode), as the handler was spending more time (in relation to the rate at which messages were published to the queues) in completing these opertations.
After optimizing the processes, the consumption and completion (in peek-lock mode) were just fine.

MDB new threads are calling onMessage while previous thread not finished

In JBOSS EAP6 I've got a long running MDB thread listening to a JMS Queue. It received a Text Message with a DB key of work it should process (loop).
During its execution I noticed that new threads spawn new MDB instances, leading to inconsistencies. I do want to prevent that in a programmatic manner or in a configuration manner whithout changing performace. So, for instance check in the onMessage that work is ongoing. I can't change the DB Model.
Since I'm running in a single VM I'm on the verge (last resort) of using a static Set that stores the DB key. (I'm a bit under time pressure to fix this).
The problem was caused by the fact I forgot the specify the transaction time-out. Hence the default time-out seems to kick in.
The problem was solved by adding the transaction time out:
#ActivationConfigProperty( propertyName = "transactionTimeout", propertyValue = "10800" )

PUSH-000503: MultiplexerBlockedException while calling Client.isSubscribed

Some times I am getting following exception while checking isSubscribed with some topic.
Checking condition : Client.isSubscribed(topic)
Exception : com.pushtechnology.diffusion.multiplexer.MultiplexerBlockedException
This exception manifests in logs/Server.log as PUSH-000503 with description
A blocking operation failed because the multiplexers failed to process it within {} milliseconds
The default value for that timeout is 30s, which is a lifetime for a multiplexer to wait. The manual says the following: "This indicates that the server is severely overloaded or deadlocked".
On another note, version v5.5 is unsupported, and you are advised to upgrade before attempting a reproduction.

How to temporarily disable a message listener

What would be a nice and good way to temporarily disable a message listener? The problem I want to solve is:
A JMS message is received by a message listener
I get an error when trying to process the message.
I wait for my system to get ready again to be able to process the message.
Until my system is ready, I don't want any more messages, so...
...I want to disable the message listener.
My system is ready for processing again.
The failed message gets processed, and the JMS message gets acknowledged.
Enable the message listener again.
Right now, I'm using Sun App Server. I disabled the message listener by setting it to null in the MessageConsumer, and enabled it again using setMessageListener(myOldMessageListener), but after this I don't get any more messages.
How about if you don't return from the onMessage() listener method until your system is ready to process messages again? That'll prevent JMS from delivering another message on that consumer.
That's the async equivalent of not calling receive() in a synchronous case.
There's no multi-threading for a given JMS session, so the pipeline of messages is held up until the onMessage() method returns.
I'm not familiar with the implications of dynamically calling setMessageListener(). The javadoc says there's undefined behavior if called "when messages are being consumed by an existing listener or sync consumer". If you're calling from within onMessage(), it sounds like you're hitting that undefined case.
There are start/stop methods at the Connection level, if that's not too coarse-grained for you.
Problem solved by a workaround replacing the message listener by a receive() loop, but I'm still interested in how to disable a message listener and enable it shortly again.
That looks to me like the messages are being delivered but nothing is happening with them because you have no listener attached. It's been a while since I've done anything with JMS but don't you want to have the message sent to the dead letter queue or something while you fix the system, and then move the messages back onto the original queue once you're ready for processing again?
On WebLogic you can set up max retries, an error queue to handle messages that exceed the max retry limit, and other parameters. I'm not certain off the top of my head, but you also might be able to specify a wait period. All this is available to you in the admin console. I'd look at the admin for the JMS provider you've got and see if it can do something similar.
In JBoss the following code will do the trick:
MBeanServer mbeanServer = MBeanServerLocator.locateJBoss();
ObjectName objName = new ObjectName("jboss.j2ee:ear=MessageGateway.ear,jar=MessageGateway-EJB.jar,name=MessageSenderMDB,service=EJB3");
JMSContainerInvokerMBean invoker = (JMSContainerInvokerMBean) MBeanProxy.get(JMSContainerInvokerMBean.class, objName, mbeanServer);
invoker.stop(); //Stop MDB
invoker.start(); //Start MDB
I think you can call
messageConsumer.setMessageListener(null);
inside your MessageListener implementation and schedule the reestablishment task (for example in ScheduledExecutorService). This task should call
connection.stop();
messageConsumer.setMessageListener(YOUR_NEW_LISTENER);
connection.start();
and it will be working. start() and stop() methods are used for restarting delivery structrues (not TCP connection).
Read the Javadoc https://docs.oracle.com/javaee/7/api/javax/jms/Connection.html#stop--
Temporarily stops a connection's delivery of incoming messages. Delivery can be restarted using the connection's start method. When the connection is stopped, delivery to all the connection's message consumers is inhibited: synchronous receives block, and messages are not delivered to message listeners.
For temporarily stops a connection's delivery of incoming messages you need to use stop() method from Connection interface: https://docs.oracle.com/javaee/7/api/javax/jms/Connection.html#stop--
Just don't call connection.stop() from MessageListener because according to JMS spec. you will get deadlock or exception. Instead you can call connection.stop() from different thread, you just need to synchronize MessageListener and thread that going to suspend connection with function connection.stop()

Categories