PUSH-000503: MultiplexerBlockedException while calling Client.isSubscribed - java

Some times I am getting following exception while checking isSubscribed with some topic.
Checking condition : Client.isSubscribed(topic)
Exception : com.pushtechnology.diffusion.multiplexer.MultiplexerBlockedException

This exception manifests in logs/Server.log as PUSH-000503 with description
A blocking operation failed because the multiplexers failed to process it within {} milliseconds
The default value for that timeout is 30s, which is a lifetime for a multiplexer to wait. The manual says the following: "This indicates that the server is severely overloaded or deadlocked".
On another note, version v5.5 is unsupported, and you are advised to upgrade before attempting a reproduction.

Related

Draining a job with processing_time timer

I'm working with a dataflow job with stateful processing and timer. The processus is simplified as below :
Receiving messages from PubSub Subscription.
Keeping documents into bagState.
Checking with a loop timer (processing_time) if all conditions are met
if ok, clear bagState and generate a new message to next step.
Convert and send message to PubSub Topic.
Don't have set particular windowing policies, so I'm using GlobalWindow (as I understood).
When draining is performed (with continuously incoming message - 1k/sec - don't know if it could be related), the job raise this exception :
Error message from worker: java.lang.IllegalArgumentException: Attempted to set a processing-time timer with an output timestamp of 294247-01-10T04:00:54.775Z that is after the expiration of window 294247-01-09T04:00:54.775Z
with related stacktrace :
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$TimerInternalsTimer.setAndVerifyOutputTimestamp(SimpleDoFnRunner.java:1229)
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$TimerInternalsTimer.setRelative(SimpleDoFnRunner.java:1138)
xxx.xxxxx.xxxxxxxxx.transform.AggregateLogsAuditFn.onLoopIteration(AggregateLogsAuditFn.java:292)
This error handles when resetting loop timer in #ProcessElements method (or whatever in #OnTimer) :
loopTimer.offset(Duration.standardSeconds(loopTimerSec.get())).setRelative();
The timer (and it's value) are declared as :
#TimerId("loopTimer") private final TimerSpec loopTimer = TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
private ValueProvider<Integer> latenessTimerSec;
As an exception is raised, the job won't stop properly ; and we need to cancel it.
Please note that updating the job (with --update) is working fine, and this exception never appears when the job is running normally.
Thanks for your advices, Lionel.
Draining advances the watermark to "the end of time" (aka 294247-01-10) and the error seems to indicate that a timer is being set to shortly after this. To make th loop timer compatible with drain you should avoid setting it if the current time is this high.

Send operation timed out when producing events to Azure eventhubs

I have a long-running application that uses azure eventhub SDK (5.1.0), continually publishing data to Azure event hub. The service threw the below exception after few days. What could be the cause of this and how we can overcome this?
Stack Trace:
Exception in thread "SendTimeout-timer" reactor.core.Exceptions$BubblingException: com.azure.core.amqp.exception.AmqpException: Entity(abc): Send operation timed out, errorContext[NAMESPACE: abc-eventhub.servicebus.windows.net, PATH: abc-metrics, REFERENCE_ID: 70288acf171a4614ab6dcfe2884ee9ec_G2S2, LINK_CREDIT: 210]
at reactor.core.Exceptions.bubble(Exceptions.java:173)
at reactor.core.publisher.Operators.onErrorDropped(Operators.java:612)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.onError(FluxTimeout.java:203)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:185)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:251)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onError(FluxHide.java:132)
at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:185)
at com.azure.core.amqp.implementation.ReactorSender$SendTimeout.run(ReactorSender.java:565)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Caused by: com.azure.core.amqp.exception.AmqpException: Entity(abc-metrics): Send operation timed out, errorContext[NAMESPACE: abc-eventhub.servicebus.windows.net, PATH: abc-metrics, REFERENCE_ID: 70288acf171a4614ab6dcfe2884ee9ec_G2S2, LINK_CREDIT: 210]
at com.azure.core.amqp.implementation.ReactorSender$SendTimeout.run(ReactorSender.java:562)
... 2 more
I'm using Azure eventhub Java SDK 5.1.0
According to official documentation:
A TimeoutException indicates that a user-initiated operation is taking
longer than the operation timeout.
For Event Hubs, the timeout is specified either as part of the
connection string, or through ServiceBusConnectionStringBuilder. The
error message itself might vary, but it always contains the timeout
value specified for the current operation.
Common causes
There are two common causes for this error: incorrect
configuration, or a transient service error.
Incorrect configuration The operation timeout might be too small for
the operational condition. The default value for the operation timeout
in the client SDK is 60 seconds. Check to see if your code has the
value set to something too small. The condition of the network and CPU
usage can affect the time it takes for a particular operation to
complete, so the operation timeout should not be set to a small value.
Transient service error Sometimes the Event Hubs service can
experience delays in processing requests; for example, during periods
of high traffic. In such cases, you can retry your operation after a
delay, until the operation is successful. If the same operation still
fails after multiple attempts, visit the Azure service status site to
see if there are any known service outages.
If you are consistently seeing this error frequently, I would suggest to reach Azure support for a deeper look.

How to know the error / exception when file upload download transfer state is FAILED

I am using transfer manager available in AWS SDK for file upload and download.Upload and download method returns Upload , Download Object respectively. I am using the isDone() method to check if the upload/download is finished.Now isDone() method returns true even when TransferState is FAILED. I need to know the error or exception that has occurred which has caused this failure. How can i do that.
By the definition and documentation;
isDone Returns true if this transfer is finished (i.e. completed successfully, failed, or was canceled). Returns false if otherwise.
What you may use;
waitForCompletion: Waits for this transfer to complete. This is a blocking call; the current thread is suspended until this transfer completes. details here
Maybe this one
waitForException: Waits for this transfer to finish and returns any error that occurred, or returns null if no errors occurred. This is a blocking call; the current thread will be suspended until this transfer either fails or completes successfully. details here
Both are blocking calls but throws exceptions which includes details.

How to recover client from "No handler waiting for message" warning?

At medium to high load (test and production), when using the Vert.x Redis client, I get the following warning after a few hundred requests.
2019-11-22 11:30:02.320 [vert.x-eventloop-thread-1] WARN io.vertx.redis.client.impl.RedisClient - No handler waiting for message: [null, 400992, <data from redis>]
As a result, the handler supplied to the Redis call (see below) does not get called and the incoming request times out.
Handler<AsyncResult<String>> handler = res -> {
// success handler
};
redis.get(key, res -> {
handler.handle(res);
});
The real issue is that once the "No handler ..." warning comes up, the Redis client becomes useless because all further calls to Redis made via the client fails with the same warning resulting in the handler not getting called. I have an exception handler set on the client to attempt reconnection, but I do not see any reconnections being attempted.
How can one recover from this problem? Any workarounds to alleviate the severity would also be great.
I'm on vertx-core and vertx-redis-client 3.8.1 .
The upcoming 4.0 release had addressed this issue and a release should be hapening soon, how soon, I can't really tell.
The problem is that we can't easily port back from the master branch to the 3.8 branch because a major refactoring has happened on the client and the codebases are very different.
The new code, uses a connection pool and has been tested for concurrent access (and this is where the issue you're seeing comes from). Under load the requests are routed across all event loops and the queue that maintains the state between in flight requests (requests sent to redis) and waiting handlers would get out of sync in very special conditions.
So I'd first try to see if you can already start moving your code to 4.0, you can have a try with the 4.0.0-milestone3 version but to be totally fine, just have a run with the latest master which has more issues solved in this area.

What is the behavior of setting timeout in aggregation pipeline execution in MongoDB Java Driver?

I need to set a timeout on aggregation pipeline execution. I am using MongoDB Java driver 3.2. I know that the code I have to use is the following:
collection.aggregate(pipeline).maxTime(10, TimeUnit.SECONDS);
The problem is that I cannot find anywhere what is the behavior of the program once the timeout is reached. Does it thrown an exception? Does it terminate silently returning a null result?
The official MongoDB documentation says nothing (see cursor.maxTimeMS()). Also the Java API does not refer to any particular behavior (see maxTime).
How is it possible?!
Ok, I've got it. If the execution of the aggregation pipeline exceeds the time expressed through the method maxTime a com.mongodb.MongoExecutionTimeoutException is thrown.
The stacktrace of the exception is exactly the following:
com.mongodb.MongoExecutionTimeoutException: operation exceeded time limit
at com.mongodb.connection.ProtocolHelper.createSpecialException(ProtocolHelper.java:157)
at com.mongodb.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:111)
at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:114)
at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:286)
at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:173)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:215)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:206)
at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:112)
at com.mongodb.operation.AggregateOperation$1.call(AggregateOperation.java:227)
at com.mongodb.operation.AggregateOperation$1.call(AggregateOperation.java:223)
at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:239)
at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:212)
at com.mongodb.operation.AggregateOperation.execute(AggregateOperation.java:223)
at com.mongodb.operation.AggregateOperation.execute(AggregateOperation.java:65)
at com.mongodb.Mongo.execute(Mongo.java:772)
at com.mongodb.Mongo$2.execute(Mongo.java:759)
Hope it helps.

Categories