How to detect recovery in a retrying Flux?

How to detect recovery in a retrying Flux? - java

I have successfully setup a Flux receiving events from a remote system (the protocol is websocket but that's irrelevant for the question) and handling connection glitches gracefully using retryBackoff method. The code (simplified) is something like this:
Flux flux = myEventFlux
.retryBackoff( Long.MAX_VALUE, Duration.ofSeconds(5) )
.publish()
.refCount();
flux.subscribe( System.out::println );
Now I'd like to handle connection lost and connection recovery events in order to show some cues in the UI, or at least register some logs. Detecting errors seems easy, just a doOnError before retryBackoff does the trick. But recovery is another story... I need something like "first successful event after an error", so I've tryed this:
flux.next().subscribe( event -> System.out.println("first = " + event) );
It works in the first normal connection (no previous error) but not in subsequent reconnections after errors.

The difficulty with what you want to achieve is that there is no way to distinguish two legitimate end-of-line subscribers subscribing to the retrying Flux vs one subscriber that triggers two attempts, from the perspective of myEventFlux.
So you could use doOnSubscribe (or doFirst since 3.2.10.RELEASE), but it would be subject to the limitation above. It would also trigger on the original connection, not just the retries...
Maybe for the UI use case this would still help?

Related

How to correctly implement delay in Spring Integration Flow

I am trying to implement a delay in a Spring Integration Flow.
I have one flow that is starting a process on another server and then I am checking after a delay if that process is completed or not.
When completed the flow should move to the next phase.
This seems to work it also shows in logs (and, clearly, in the flow itself), a long list of repetitions in the runexampleScriptWaiting channel.
I tried removing that channel change but then the flow gets stuck in that phase forever, never moving to completion.
How can I implement this so that a single runexampleScriptWaiting is shown / executed (something similar to a non-blocking while loop, I guess)?
I considered keeping it as is and just update my monitoring application (a very small frontend that shows which channels are in the payload's history) in order to get rid of duplicated channel lines but I also wondered if there is a better / more robust way to do this.
Here's a simplified example:
#Bean
public IntegrationFlow exampleIntegrationFlow() {
return IntegrationFlows
.from(exampleConfig.runexampleScript.get())
.<ExamplePayload>handle((payload, messageHeaders) -> examplePayloadService
.changeExampleServiceRequestStatus(payload, ExampleServiceStatus.STARTED))
.<ExamplePayload>handle(
(payload, messageHeaders) -> exampleScriptService.runexample(payload))
.channel(exampleConfig.runexampleScriptWaiting.get())
.<ExamplePayload, Boolean>route(jobStatusService::areJobsFinished,
router -> router
.subFlowMapping(true, exampleSuccessSubflow())
.subFlowMapping(false, exampleWaitSubflow())
.defaultOutputToParentFlow()
)
.get();
}
#Bean
public IntegrationFlow exampleWaitSubflow() {
return IntegrationFlows
.from(exampleConfig.runexampleScriptWaiting.get())
.<ExamplePayload>handle(
(payload, messageHeaders) -> {
interruptIgnoringSleep(1000);
return payload;
})
.channel(exampleConfig.runexampleScriptWaiting.get()) // Commenting this gets the process stuck
.get();
}

It is not clear what is your exampleConfig.runexampleScriptWaiting.get(), but what you have so far in the config is not OK. You have two subscribers to the same channel:
.channel(exampleConfig.runexampleScriptWaiting.get()) and the next route()
.from(exampleConfig.runexampleScriptWaiting.get()) and the next handle()
This may cause unexpected behavior, e.g. round-robin messages distribution.
I would do filter() and delay() instead in addition to an ExecutorChannel since you are asking about non-blocking retry:
.channel(exampleConfig.runexampleScriptWaiting.get())
.filter(jobStatusService::areJobsFinished,
filter -> filter.discardFlow(
discardFlow -> discardFlow
.delay(1000)
.channel(exampleConfig.runexampleScriptWaiting.get())))
The exampleSuccessSubflow could go just after this filter() as part of this flow or via to(exampleSuccessSubflow()).
Pay attention to that discardFlow: we delay non-finished message a little bit and produce it back to that runexampleScriptWaiting channel for calling this filter again. If you make this channel as an ExecutorChannel (or QueueChannel), your wait functionality is going to be non-blocking. But at the same time your main flow is still going to be blocked for this request since you continue waiting for reply. Therefore it might not make too much sense to make this filtering logic as non-blocking and you can still use that Thread.sleep() instead of delay().
The router solution also may work, but you cannot use that runexampleScriptWaiting channel as an input of that sub-flow. Probably that's the reason behind that your problem with "process stuck".

Spring Cloud Stream - notice and handle errors in broker

I am fairly new to developing distributed applications with messaging, and to Spring Cloud Stream in particular. I am currently wondering about best practices on how to deal with errors on the broker side.
In our application, we need to both consume and produce messages from/to multiple sources/destinations like this:
Consumer side
For consuming, we have defined multiple #Beans of type java.util.function.Consumer. The configuration for those looks like this:
spring.cloud.stream.bindings.consumeA-in-0.destination=inputA
spring.cloud.stream.bindings.consumeA-in-0.group=$Default
spring.cloud.stream.bindings.consumeB-in-0.destination=inputB
spring.cloud.stream.bindings.consumeB-in-0.group=$Default
This part works quite well - wenn starting the application, the exchanges "inputA" and "inputB" as well as the queues "inputA.$Default" and "inputB.$Default" with corresponding binding are automatically created in RabbitMQ.
Also, in case of an error (e.g. a queue is suddenly not available), the application gets notified immediately with a QueuesNotAvailableException and continuously tries to re-establish the connection.
My only question here is: Is there some way to handle this exception in code? Or, what are best practices to deal with failures like this on broker side?
Producer side
This one is more problematic. Producing messages is triggered by some internal logic, we cannot use function #Beans here. Instead, we currently rely on StreamBridge to send messages. The problem is that this approach does not trigger creation of exchanges and queues on startup. So when our code calls streamBridge.send("outputA", message), the message is sent (result is true), but it just disappears into the void since RabbitMQ automatically drops unroutable messages.
I found that with this configuration, I can at least get RabbitMQ to create exchanges and queues as soon as the first message is sent:
spring.cloud.stream.source=produceA;produceB
spring.cloud.stream.default.producer.requiredGroups=$Default
spring.cloud.stream.bindings.produceA-out-0.destination=outputA
spring.cloud.stream.bindings.produceB-out-0.destination=outputB
I need to use streamBridge.send("produceA-out-0", message) in code to make it work, which is not too great since it means having explicit configuration hardcoded, but at least it works.
I also tried to implement the producer in a Reactor style as desribed in this answer, but in this case the exchange/queue also is not created on application startup and the sent message just disappears even though the return status of the sending method is "OK".
Failures on the broker side are not registered at all with this approach - when I simulate one e.g. by deleting the queue or the exchange, it is not registered by the application. Only when another message is sent, I get in the logs:
ERROR 21804 --- [127.0.0.1:32404] o.s.a.r.c.CachingConnectionFactory : Shutdown Signal: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'produceA-out-0' in vhost '/', class-id=60, method-id=40)
But still, the result of StreamBridge#send was true in this case. But we need to know that sending did actually fail at this point (we persist the state of the sent object using this boolean return value). Is there any way to accomplish that?
Any other suggestions on how to make this producer scenario more robust? Best practices?
EDIT
I found an interesting solution to the producer problem using correlations:
...
CorrelationData correlation = new CorrelationData(UUID.randomUUID().toString());
messageHeaderAccessor.setHeader(AmqpHeaders.PUBLISH_CONFIRM_CORRELATION, correlation);
Message<String> message = MessageBuilder.createMessage(payload, messageHeaderAccessor.getMessageHeaders());
boolean sent = streamBridge.send(channel, message);
try {
final CorrelationData.Confirm confirm = correlation.getFuture().get(30, TimeUnit.SECONDS);
if (correlation.getReturned() == null && confirm.isAck()) {
// success logic
} else {
// failed logic
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
// failed logic
} catch (ExecutionException | TimeoutException e) {
// failed logic
}
using these additional configurations:
spring.cloud.stream.rabbit.default.producer.useConfirmHeader=true
spring.rabbitmq.publisher-confirm-type=correlated
spring.rabbitmq.publisher-returns=true
This seems to work quite well, although I'm still clueless about the return value of StreamBridge#send, it is always true and I cannot find information in which cases it would be false. But the rest is fine, I can get information on issues with the exchange or the queue from the correlation or the confirm.
But this solution is very much focused on RabbitMQ, which causes two problems:
our application should be able to connect to different brokers (e.g. Azure Service Bus)
in tests we use Kafka binder and I don't know how to configure the application context to make it work in this case, too
Any help would be appreciated.

On the consumer side, you can listen for an event such as the ListenerContainerConsumerFailedEvent.
https://docs.spring.io/spring-amqp/docs/current/reference/html/#consumer-events
On the producer side, producers only know about exchanges, not any queues bound to them; hence the requiredGroups property which causes the queue to be bound.
You only need spring.cloud.stream.default.producer.requiredGroups=$Default - you can send to arbitrary destinations using the StreamBridge and the infrastructure will be created.
#SpringBootApplication
public class So70769305Application {
public static void main(String[] args) {
SpringApplication.run(So70769305Application.class, args);
}
#Bean
ApplicationRunner runner(StreamBridge bridge) {
return args -> bridge.send("foo", "test");
}
}
spring.cloud.stream.default.producer.requiredGroups=$Default

Can you cancel a rabbitmq consumer remotely?

I am reading the documentation about Channel.basicCancel operation in rabbitmq https://www.rabbitmq.com/consumer-cancel.html . The docs says that one of possible cancellation case is when consumer sends cancel signal on the same channel on which it is listening.
Is this the only possibility? Can you cancel remote consumer running on different channel/connection/process?
I am trying to send the cancel request from another another process. When I do it ends with an exception java.io.IOException: Unknown consumerTag just like such operation was restricted to cancelling local consumers (on own channel or connection).
UPDATE:
I noticed that this "Unknown consumerTag" exception is a result of initial validation inside com.rabbitmq.client.impl.ChannelN.basicCancel(String):
Consumer originalConsumer = (Consumer)this._consumers.get(consumerTag);
if (originalConsumer == null) {
throw new IOException("Unknown consumerTag");
}
...
But still there might be some rpc call which does the trick...

The RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
The documentation is correct, you must cancel a consumer from its own channel/connection.
Other options include making your consumers aware of "cancellation messages" that will cause them to stop themselves, or using the API to close an entire connection, which will close all channels associated with it.

Http Websocket as Akka Stream Source

I'd like to listen on a websocket using akka streams. That is, I'd like to treat it as nothing but a Source.
However, all official examples treat the websocket connection as a Flow.
My current approach is using the websocketClientFlow in combination with a Source.maybe. This eventually results in the upstream failing due to a TcpIdleTimeoutException, when there are no new Messages being sent down the stream.
Therefore, my question is twofold:
Is there a way – which I obviously missed – to treat a websocket as just a Source?
If using the Flow is the only option, how does one handle the TcpIdleTimeoutException properly? The exception can not be handled by providing a stream supervision strategy. Restarting the source by using a RestartSource doesn't help either, because the source is not the problem.
Update
So I tried two different approaches, setting the idle timeout to 1 second for convenience
application.conf
akka.http.client.idle-timeout = 1s
Using keepAlive (as suggested by Stefano)
Source.<Message>maybe()
.keepAlive(Duration.apply(1, "second"), () -> (Message) TextMessage.create("keepalive"))
.viaMat(Http.get(system).webSocketClientFlow(WebSocketRequest.create(websocketUri)), Keep.right())
{ ... }
When doing this, the Upstream still fails with a TcpIdleTimeoutException.
Using RestartFlow
However, I found out about this approach, using a RestartFlow:
final Flow<Message, Message, NotUsed> restartWebsocketFlow = RestartFlow.withBackoff(
Duration.apply(3, TimeUnit.SECONDS),
Duration.apply(30, TimeUnit.SECONDS),
0.2,
() -> createWebsocketFlow(system, websocketUri)
);
Source.<Message>maybe()
.viaMat(restartWebsocketFlow, Keep.right()) // One can treat this part of the resulting graph as a `Source<Message, NotUsed>`
{ ... }
(...)
private Flow<Message, Message, CompletionStage<WebSocketUpgradeResponse>> createWebsocketFlow(final ActorSystem system, final String websocketUri) {
return Http.get(system).webSocketClientFlow(WebSocketRequest.create(websocketUri));
}
This works in that I can treat the websocket as a Source (although artifically, as explained by Stefano) and keep the tcp connection alive by restarting the websocketClientFlow whenever an Exception occurs.
This doesn't feel like the optimal solution though.

No. WebSocket is a bidirectional channel, and Akka-HTTP therefore models it as a Flow. If in your specific case you care only about one side of the channel, it's up to you to form a Flow with a "muted" side, by using either Flow.fromSinkAndSource(Sink.ignore, mySource) or Flow.fromSinkAndSource(mySink, Source.maybe), depending on the case.
as per the documentation:
Inactive WebSocket connections will be dropped according to the
idle-timeout settings. In case you need to keep inactive connections
alive, you can either tweak your idle-timeout or inject ‘keep-alive’
messages regularly.
There is an ad-hoc combinator to inject keep-alive messages, see the example below and this Akka cookbook recipe. NB: this should happen on the client side.
src.keepAlive(1.second, () => TextMessage.Strict("ping"))

I hope I understand your question correctly. Are you looking for asSourceOf?
path("measurements") {
entity(asSourceOf[Measurement]) { measurements =>
// measurement has type Source[Measurement, NotUsed]
...
}
}

How to temporarily disable a message listener

What would be a nice and good way to temporarily disable a message listener? The problem I want to solve is:
A JMS message is received by a message listener
I get an error when trying to process the message.
I wait for my system to get ready again to be able to process the message.
Until my system is ready, I don't want any more messages, so...
...I want to disable the message listener.
My system is ready for processing again.
The failed message gets processed, and the JMS message gets acknowledged.
Enable the message listener again.
Right now, I'm using Sun App Server. I disabled the message listener by setting it to null in the MessageConsumer, and enabled it again using setMessageListener(myOldMessageListener), but after this I don't get any more messages.

How about if you don't return from the onMessage() listener method until your system is ready to process messages again? That'll prevent JMS from delivering another message on that consumer.
That's the async equivalent of not calling receive() in a synchronous case.
There's no multi-threading for a given JMS session, so the pipeline of messages is held up until the onMessage() method returns.
I'm not familiar with the implications of dynamically calling setMessageListener(). The javadoc says there's undefined behavior if called "when messages are being consumed by an existing listener or sync consumer". If you're calling from within onMessage(), it sounds like you're hitting that undefined case.
There are start/stop methods at the Connection level, if that's not too coarse-grained for you.

Problem solved by a workaround replacing the message listener by a receive() loop, but I'm still interested in how to disable a message listener and enable it shortly again.

That looks to me like the messages are being delivered but nothing is happening with them because you have no listener attached. It's been a while since I've done anything with JMS but don't you want to have the message sent to the dead letter queue or something while you fix the system, and then move the messages back onto the original queue once you're ready for processing again?

On WebLogic you can set up max retries, an error queue to handle messages that exceed the max retry limit, and other parameters. I'm not certain off the top of my head, but you also might be able to specify a wait period. All this is available to you in the admin console. I'd look at the admin for the JMS provider you've got and see if it can do something similar.

In JBoss the following code will do the trick:
MBeanServer mbeanServer = MBeanServerLocator.locateJBoss();
ObjectName objName = new ObjectName("jboss.j2ee:ear=MessageGateway.ear,jar=MessageGateway-EJB.jar,name=MessageSenderMDB,service=EJB3");
JMSContainerInvokerMBean invoker = (JMSContainerInvokerMBean) MBeanProxy.get(JMSContainerInvokerMBean.class, objName, mbeanServer);
invoker.stop(); //Stop MDB
invoker.start(); //Start MDB

I think you can call
messageConsumer.setMessageListener(null);
inside your MessageListener implementation and schedule the reestablishment task (for example in ScheduledExecutorService). This task should call
connection.stop();
messageConsumer.setMessageListener(YOUR_NEW_LISTENER);
connection.start();
and it will be working. start() and stop() methods are used for restarting delivery structrues (not TCP connection).
Read the Javadoc https://docs.oracle.com/javaee/7/api/javax/jms/Connection.html#stop--
Temporarily stops a connection's delivery of incoming messages. Delivery can be restarted using the connection's start method. When the connection is stopped, delivery to all the connection's message consumers is inhibited: synchronous receives block, and messages are not delivered to message listeners.

For temporarily stops a connection's delivery of incoming messages you need to use stop() method from Connection interface: https://docs.oracle.com/javaee/7/api/javax/jms/Connection.html#stop--
Just don't call connection.stop() from MessageListener because according to JMS spec. you will get deadlock or exception. Instead you can call connection.stop() from different thread, you just need to synchronize MessageListener and thread that going to suspend connection with function connection.stop()

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to detect recovery in a retrying Flux? - java

Related

How to correctly implement delay in Spring Integration Flow

Spring Cloud Stream - notice and handle errors in broker

Can you cancel a rabbitmq consumer remotely?

Http Websocket as Akka Stream Source

How to temporarily disable a message listener

Categories

Resources