Spring Cloud Stream - notice and handle errors in broker - java

I am fairly new to developing distributed applications with messaging, and to Spring Cloud Stream in particular. I am currently wondering about best practices on how to deal with errors on the broker side.
In our application, we need to both consume and produce messages from/to multiple sources/destinations like this:
Consumer side
For consuming, we have defined multiple #Beans of type java.util.function.Consumer. The configuration for those looks like this:
spring.cloud.stream.bindings.consumeA-in-0.destination=inputA
spring.cloud.stream.bindings.consumeA-in-0.group=$Default
spring.cloud.stream.bindings.consumeB-in-0.destination=inputB
spring.cloud.stream.bindings.consumeB-in-0.group=$Default
This part works quite well - wenn starting the application, the exchanges "inputA" and "inputB" as well as the queues "inputA.$Default" and "inputB.$Default" with corresponding binding are automatically created in RabbitMQ.
Also, in case of an error (e.g. a queue is suddenly not available), the application gets notified immediately with a QueuesNotAvailableException and continuously tries to re-establish the connection.
My only question here is: Is there some way to handle this exception in code? Or, what are best practices to deal with failures like this on broker side?
Producer side
This one is more problematic. Producing messages is triggered by some internal logic, we cannot use function #Beans here. Instead, we currently rely on StreamBridge to send messages. The problem is that this approach does not trigger creation of exchanges and queues on startup. So when our code calls streamBridge.send("outputA", message), the message is sent (result is true), but it just disappears into the void since RabbitMQ automatically drops unroutable messages.
I found that with this configuration, I can at least get RabbitMQ to create exchanges and queues as soon as the first message is sent:
spring.cloud.stream.source=produceA;produceB
spring.cloud.stream.default.producer.requiredGroups=$Default
spring.cloud.stream.bindings.produceA-out-0.destination=outputA
spring.cloud.stream.bindings.produceB-out-0.destination=outputB
I need to use streamBridge.send("produceA-out-0", message) in code to make it work, which is not too great since it means having explicit configuration hardcoded, but at least it works.
I also tried to implement the producer in a Reactor style as desribed in this answer, but in this case the exchange/queue also is not created on application startup and the sent message just disappears even though the return status of the sending method is "OK".
Failures on the broker side are not registered at all with this approach - when I simulate one e.g. by deleting the queue or the exchange, it is not registered by the application. Only when another message is sent, I get in the logs:
ERROR 21804 --- [127.0.0.1:32404] o.s.a.r.c.CachingConnectionFactory : Shutdown Signal: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'produceA-out-0' in vhost '/', class-id=60, method-id=40)
But still, the result of StreamBridge#send was true in this case. But we need to know that sending did actually fail at this point (we persist the state of the sent object using this boolean return value). Is there any way to accomplish that?
Any other suggestions on how to make this producer scenario more robust? Best practices?
EDIT
I found an interesting solution to the producer problem using correlations:
...
CorrelationData correlation = new CorrelationData(UUID.randomUUID().toString());
messageHeaderAccessor.setHeader(AmqpHeaders.PUBLISH_CONFIRM_CORRELATION, correlation);
Message<String> message = MessageBuilder.createMessage(payload, messageHeaderAccessor.getMessageHeaders());
boolean sent = streamBridge.send(channel, message);
try {
final CorrelationData.Confirm confirm = correlation.getFuture().get(30, TimeUnit.SECONDS);
if (correlation.getReturned() == null && confirm.isAck()) {
// success logic
} else {
// failed logic
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
// failed logic
} catch (ExecutionException | TimeoutException e) {
// failed logic
}
using these additional configurations:
spring.cloud.stream.rabbit.default.producer.useConfirmHeader=true
spring.rabbitmq.publisher-confirm-type=correlated
spring.rabbitmq.publisher-returns=true
This seems to work quite well, although I'm still clueless about the return value of StreamBridge#send, it is always true and I cannot find information in which cases it would be false. But the rest is fine, I can get information on issues with the exchange or the queue from the correlation or the confirm.
But this solution is very much focused on RabbitMQ, which causes two problems:
our application should be able to connect to different brokers (e.g. Azure Service Bus)
in tests we use Kafka binder and I don't know how to configure the application context to make it work in this case, too
Any help would be appreciated.

On the consumer side, you can listen for an event such as the ListenerContainerConsumerFailedEvent.
https://docs.spring.io/spring-amqp/docs/current/reference/html/#consumer-events
On the producer side, producers only know about exchanges, not any queues bound to them; hence the requiredGroups property which causes the queue to be bound.
You only need spring.cloud.stream.default.producer.requiredGroups=$Default - you can send to arbitrary destinations using the StreamBridge and the infrastructure will be created.
#SpringBootApplication
public class So70769305Application {
public static void main(String[] args) {
SpringApplication.run(So70769305Application.class, args);
}
#Bean
ApplicationRunner runner(StreamBridge bridge) {
return args -> bridge.send("foo", "test");
}
}
spring.cloud.stream.default.producer.requiredGroups=$Default

Related

Safe way to use batch listener

I am trying to use spring-kafka 1.3.x (1.3.3 and 1.3.4). What is not clear is whether there is a safe way to consume messages in batch without skipping a message (or set of messages) when an exception occurs eg network outage. My preference is also to leverage the container capabilities as much as possible to remain in Spring framework rather than trying to create a custom framework for dealing with this challenge.
I am setting the following properties onto a ConcurrentMessageListenerContainer :
.setAckOnError(false);
.setAckMode(AckMode.MANUAL);
I am also setting the following kafka specific consumer properties:
enable.auto.commit=false
auto.offset.reset=earliest
If I set a RetryTemplate, I get a class cast exception since it only works for non-batch consumers. Documentation states retry is not available for batch so this may be OK.
I then setup a consumer such as this one:
```java
#KafkaListener(containerFactory = "conatinerFactory",
groupId = "myGroup",
topics = "myTopic")
public void onMessage(#Payload List<Entries> batchedData,
#Header(required = false,
value = KafkaHeaders.OFFSET) List<Long> offsets,
Acknowledgment ack) {
log.info("Working on: {}" + offsets);
int x = 1;
if(x == 1) {
log.info("Failure on: {}" + offsets);
throw new RuntimeException("mock failure");
}
// do nothing else for now
// unreachable code
ack.acknowledge();
}
```
When I send a message into the system to mock the exception above then the only visible action to me is that the listener reports the exception.
When I send another (new) message into the system, the container consumes the new message. The old message is skipped since the offset is advanced to the next offset.
Since I have asked the container not to acknowledge (directly or indirectly) and since there is no other properties that I can see to notify the container not to advance, then I am confused why the container does advance.
What I noticed is that for a similar consideration, what is being recommended is to upgrade to 2.1.x and use the container stop capability that was added into the ContainerAware ErrorHandler there.
But what if you are trapped in 1.3.x for the time being, is there a way or missing property that can be used to ensure the container does not advance to the next message or batch of messages?
I can see an option to create a custom framework around the consumer in order to achieve the desired effect. But are there other options, simpler, and more spring friendly.
Thoughts?
From #garyrussell (spring-kafka github project)
The offset has not been committed but the broker won't send the data again. You have to re-seek the topics/partitions.
2.1 provides the SeekToCurrentBatchErrorHandler which will re-seek automatically for you.
2.0 Added consumer-aware listeners, giving you access to the consumer (for seeking) in the listener.
With 1.3.x you have to implement ConsumerSeekAware and perform the seeks yourself (in the listener after catching the exception). Save off the ConsumerSeekCallback in a ThreadLocal.
You will need to add the partitions to your method signature; then seek to the lowest offset in the list for each partition.

Http Websocket as Akka Stream Source

I'd like to listen on a websocket using akka streams. That is, I'd like to treat it as nothing but a Source.
However, all official examples treat the websocket connection as a Flow.
My current approach is using the websocketClientFlow in combination with a Source.maybe. This eventually results in the upstream failing due to a TcpIdleTimeoutException, when there are no new Messages being sent down the stream.
Therefore, my question is twofold:
Is there a way – which I obviously missed – to treat a websocket as just a Source?
If using the Flow is the only option, how does one handle the TcpIdleTimeoutException properly? The exception can not be handled by providing a stream supervision strategy. Restarting the source by using a RestartSource doesn't help either, because the source is not the problem.
Update
So I tried two different approaches, setting the idle timeout to 1 second for convenience
application.conf
akka.http.client.idle-timeout = 1s
Using keepAlive (as suggested by Stefano)
Source.<Message>maybe()
.keepAlive(Duration.apply(1, "second"), () -> (Message) TextMessage.create("keepalive"))
.viaMat(Http.get(system).webSocketClientFlow(WebSocketRequest.create(websocketUri)), Keep.right())
{ ... }
When doing this, the Upstream still fails with a TcpIdleTimeoutException.
Using RestartFlow
However, I found out about this approach, using a RestartFlow:
final Flow<Message, Message, NotUsed> restartWebsocketFlow = RestartFlow.withBackoff(
Duration.apply(3, TimeUnit.SECONDS),
Duration.apply(30, TimeUnit.SECONDS),
0.2,
() -> createWebsocketFlow(system, websocketUri)
);
Source.<Message>maybe()
.viaMat(restartWebsocketFlow, Keep.right()) // One can treat this part of the resulting graph as a `Source<Message, NotUsed>`
{ ... }
(...)
private Flow<Message, Message, CompletionStage<WebSocketUpgradeResponse>> createWebsocketFlow(final ActorSystem system, final String websocketUri) {
return Http.get(system).webSocketClientFlow(WebSocketRequest.create(websocketUri));
}
This works in that I can treat the websocket as a Source (although artifically, as explained by Stefano) and keep the tcp connection alive by restarting the websocketClientFlow whenever an Exception occurs.
This doesn't feel like the optimal solution though.
No. WebSocket is a bidirectional channel, and Akka-HTTP therefore models it as a Flow. If in your specific case you care only about one side of the channel, it's up to you to form a Flow with a "muted" side, by using either Flow.fromSinkAndSource(Sink.ignore, mySource) or Flow.fromSinkAndSource(mySink, Source.maybe), depending on the case.
as per the documentation:
Inactive WebSocket connections will be dropped according to the
idle-timeout settings. In case you need to keep inactive connections
alive, you can either tweak your idle-timeout or inject ‘keep-alive’
messages regularly.
There is an ad-hoc combinator to inject keep-alive messages, see the example below and this Akka cookbook recipe. NB: this should happen on the client side.
src.keepAlive(1.second, () => TextMessage.Strict("ping"))
I hope I understand your question correctly. Are you looking for asSourceOf?
path("measurements") {
entity(asSourceOf[Measurement]) { measurements =>
// measurement has type Source[Measurement, NotUsed]
...
}
}

Does Spring Integration RabbitTemplate publish to persistent queue by default?

I have a scheduled task that performs the following bit of code:
try {
rabbitTemplate.convertAndSend("TEST");
if (!isOn()) {
turnOn();
}
}
catch (AmqpException e) {
if (isOn()) {
turnOff();
}
}
Everything works just fine. It sends this message to the default "AMQP default" exchange. I do not have a consumer on the other end to consume these messages because I am just ensuring that the server is still alive. Will these messages accumulate over time and cause a memory leak?
Thanks!
K
Do you have a RabbitMQ user interface?
You should be able to see the queues that are being created and whether they are persistent or not. Last time I checked, the default behaviour of Spring AMQP is to create persistent queues.
Have a look at the RabbitMQ Management Plugin: http://www.rabbitmq.com/management.html
Using the RabbitMQ Management Plugin, you can also consume messages that you've published via your code.
Regarding what happens with the messages, they will just pile up and pile up until RabbitMQ hits its limits, then it will no longer accept messages until you purge the queue or consume those messages. With the default RabbitMQ settings, I was able to send about 4 million simple text messages to the queue before it started blocking.

Why do my RabbitMQ channels keep closing?

I'm debugging some Java code that uses Apache POI to pull data out of Microsoft Office documents. Occasionally, it encounter a large document and POI crashes when it runs out of memory. At that point, it tries to publish the error to RabbitMQ, so that other components can know that this step failed and take the appropriate actions. However, when it tries to publish to the queue, it gets a com.rabbitmq.client.AlreadyClosedException (clean connection shutdown; reason: Attempt to use closed channel).
Here's the error handler code:
try {
//Extraction and indexing code
}
catch(Throwable t) {
// Something went wrong! We'll publish the error and then move on with
// our lives
System.out.println("Error received when indexing message: ");
t.printStackTrace();
System.out.println();
String error = PrintExc.format(t);
message.put("error", error);
if(mime == null) {
mime = "application/vnd.unknown";
}
message.put("mime", mime);
publish("IndexFailure", "", MessageProperties.PERSISTENT_BASIC, message);
}
For completeness, here's the publish method:
private void publish(String exch, String route,
AMQP.BasicProperties props, Map<String, Object> message) throws Exception{
chan.basicPublish(exch, route, props,
JSONValue.toJSONString(message).getBytes());
}
I can't find any code within the try block that appears to close the RabbitMQ channel. Are there any circumstances in which the channel could be closed implicitly?
EDIT: I should note that the AlreadyClosedException is thrown by the basicPublish call inside publish.
An AMQP channel is closed on a channel error. Two common things that can cause a channel error:
Trying to publish a message to an exchange that doesn't exist
Trying to publish a message with the immediate flag set that doesn't have a queue with an active consumer set
I would look into setting up a ShutdownListener on the channel you're trying to use to publish a message using the addShutdownListener() to catch the shutdown event and look at what caused it.
Another reason in my case was that by mistake I acknowledged a message twice. This lead to RabbitMQ errors in the log like this after the second acknowledgment.
=ERROR REPORT==== 11-Dec-2012::09:48:29 ===
connection <0.6792.0>, channel 1 - error:
{amqp_error,precondition_failed,"unknown delivery tag 1",'basic.ack'}
After I removed the duplicate acknowledgement then the errors went away and the channel did not close anymore and also the AlreadyClosedException were gone.
I'd like to add this information for other users who will be searching for this topic
Another possible reason for Receiving a Channel Closed Exception is when Publishers and Consumers are accessing Channel/Queue with different queue declaration/settings
Publisher
channel.queueDeclare("task_queue", durable, false, false, null);
Worker
channel.queueDeclare("task_queue", false, false, false, null);
From RabbitMQ Site
RabbitMQ doesn't allow you to redefine an existing queue with different parameters and will return an error to any program that tries to do that
Apparently, there are many reasons for the AMQP connection and/or channels to close abruptly. In my case, there was too many unacknowledged messages on the queue because the consumer didn't specify the prefetch_count so the connection was getting terminated every ~1min. Limiting the number of unacknowledged messages by setting the consumer's prefetch count to a non-zero value fixed the problem.
channel.basicQos(100);
For those who wonder why their consuming channels are closing, check if you try to Ack or Nack a delivery more than once.
In the rabbitmq log you would see messages like:
operation basic.ack caused a channel exception precondition_failed:
unknown delivery tag ...
I also had this problem. The reason for my case was that, first I built the queue with durable = false and in the log file I had this error message when I switched durable to true:
"inequivalent arg 'durable' for queue 'logsQueue' in vhost '/':
received 'true' but current is 'false'"
Then, I changed the name of the queue and it worked for me. I assumed that the RabbitMQ server keeps the record of the built queues somewhere and it cannot change the status from durable to non-durable and vice versa.
Again I made durable=false for the new queue and this time I got this error
"inequivalent arg 'durable' for queue 'logsQueue1' in vhost '/':
received 'false' but current is 'true'"
My assumption was true. When I listed the queues in rabbitMQ server by:
rabbitmqctl list_queues
I saw both queues in the server.
To summarize, 2 solutions are:
1. renaming the name of the queue which is not a good solution
2. resetting rabbitMQ by:
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app

Is this a realistic expectation of a distributed mechanism?

I've been evaluating ActiveMQ as a candidate message broker. I've written some test code to try and get an understanding of ActiveMQ's performance limitations.
I can produce a failure state in the broker by sending messages as fast as possible like this:
try {
while(true) {
byte[] payload = new byte[(int) (Math.random() * 16384)];
BytesMessage message = session.createBytesMessage();
message.writeBytes(payload);
producer.send(message);
} catch (JMSException ex) { ... }
I was surprised that the line
producer.send(message);
blocks when the broker enters a failed state. I was hoping that some exception would be thrown, so there would be some indication that the broker has failed.
I realize that my test code is spamming the broker, and I expect the broker to fail. However, I would prefer that the broker failed "loudly" as opposed to simply blocking.
Is this an unrealistic expectation?
Update:
Uri's answer references an ActiveMQ bug report that was filed in March. The bug description includes a proposal that sounds like what I'm looking for: "if the request on the transport had a timeout (this is to catch failure scenarios, so something that's not expected to reasonably happen), things would have errored out rather than building waiting threads."
However, after 8 months the bug is currently unassigned with a single vote. So I guess the question still stands, is this something ActiveMQ should (will?) implement?
You are testing the 'slow consumer' and producer flowcontrol issue all message brokers have to deal with. Do you wanna fail producers, block them or spool to disk?
Basically the out of the box default in ActiveMQ is to block producers. But you can configure message cursors to spool to disk.
BTW you've not said if you are using queues/topics or persistent/non-persistent; if you are using non persistent topics there are other strategies you can use for discarding messages etc.
Apprently there's a known issue, not sure if it's been fixed:
https://issues.apache.org/activemq/browse/AMQ-1625
Not sure about ActiveMQ config, but other JMS providers have various configuration options - so you maybe able to get ActiveMQ to do as you wish in that situation.
I know Fiorano has options to specify whether providers block or not in this situation.

Categories